Note: Greenhouse Predicts is not available for organizations with a Greenhouse Recruiting account in Silo 101.

Greenhouse Predicts uses machine learning to make predictions for each job. Machine learning analyzes huge amounts of data and find complicated relationships.

How Greenhouse Predicts works: Non-technical explanation

By observing historical data — millions of applications across hundreds of thousands of jobs — we were able to build a statistical model of job pipelines and candidates moving through them. We use this model to predict future outcomes, such as when a candidate might accept an offer.

Specifically, Greenhouse Predicts is powered by a Bayesian linear regression model that weighs many factors. Some of these factors include:

  • Number of candidates in each milestone
  • Candidate source
  • Days the job has been open
  • Days since last hire (if any)
  • Offer status (if any)
  • Much more!

In addition, we look at the average hiring speed across all Greenhouse Recruiting customers as well as the hiring speed for your specific organization. If your company is typically much faster or slower than the average hiring speed of our other customers, this factor will be weighed more heavily in the equation. This means:

  • If your organization doesn't have a lot of data, for example if you just joined Greenhouse Recruiting or haven't yet hired much, we can still make predictions based on the average data from all organizations.
  • However, the more data you have, the more fine-tuned the predictions will be to your company. Your predictions should get better and better over time as we get to know your hiring habits.

Next, we run this equation hundreds of times to create a histogram of data. A histogram for a single job might look something like this:


Each bar represents a prediction. For example, the bar on the far left might be a prediction that you'll make a hire next week. And the bars all the way on the right might be predictions saying that you'll make a hire 30 weeks from today. Neither of those seem very likely taken on their own.

We add all these predictions together and take the mean, which is the most likely hire date. We then display this mean value, or average, on the job dashboard as our predicted offer acceptance date.

How predictions change over time

We utilize a number of factors to make a prediction. Some of these factors include:

  • Number of candidates in each milestone
  • Candidate source
  • Days the job has been open
  • Days since last hire was made on the job (if any)
  • Candidate offer status (if any)
  • Your company's historical time-to-hire

Each of these factors is given a certain value and weight in the equation. When any of these factors change, it's possible that the prediction might, too.

However, each of these inputs might have a small effect on the overall prediction because we're looking at so many different combinations of the data, so making one change (e.g. adding a new candidate with a high quality source) might not shift your prediction.

How specific predictions are generated

It's not possible to tell you exactly how we made a prediction or how each candidate in your pipeline contributed to it.

A number of these factors are evaluated by a machine learning process that makes a final prediction. Machine learning allows computers to recognize patterns, make data-driven decisions, and learn from new data — meaning that they will improve over time as they see new data.

Machine learning can find complicated relationships that are not always easy to explain, understand, or untangle. However, by training the machine learning process on huge amounts of data, we can trust the decisions the computer makes by comparing predictions to actual hire dates.

How Greenhouse Predicts works: Technical explanation

Greenhouse Predicts uses a Bayesian linear regression model. The linear regression equation looks something like this:

Y = A * X + B org + B everyone

  • Y is the logarithm of the time to next hire
  • X is a set of inputs, including things like: number of candidates by milestone and source, days since last hire or days the job has been open, offer status (unresolved, accepted, rejected, deprecated). 
  • A is a corresponding set of coefficients for those inputs. These coefficients are determined by machine learning and by training the model with our data set.
  • B is an intercept. B org is the organization specific intercept (e.g. all pipelines for an organization move faster or slower relative to the average). B everyone is an intercept that applies to all organizations equally.

The Bayesian part of the model means that instead of calculating single values for the coefficients, we generate probability distributions over values for those coefficients including the intercept. Your organization's values could deviate from the average of all organizations. For example, Company ABC with a faster hiring speed might have a B org that is smaller than average.

Once we have probability distributions over these coefficients, we can generate random outputs ( Y ) for specific inputs ( X ).

A histogram of results will be returned. We then take the mean of those samples of Y-values to generate a prediction.