How Greenhouse makes predictions

Greenhouse Predicts uses machine learning to make predictions for each job. Machine learning analyzes huge amounts of data and find complicated relationships. In this article, we will cover how predictions are generated and what data is being used, as well as how predictions can change. 

 

How Greenhouse Predicts works: Non-technical explanation

By observing historical data - millions of applications across hundreds of thousands of jobs - we were able to build a statistical model of job pipelines and candidates moving through them. We use this model to predict future outcomes, like when someone will accept an offer.

Specifically, Greenhouse Predicts is powered by a Bayesian linear regression model. This means that we throw a number of factors into an equation to make a prediction. Some of these factors include:

  • Number of candidates in each milestone
  • Candidate source
  • Days the job has been open
  • Days since last hire (if any)
  • Offer status (if any)
  • Lots of other things!

In addition, we look at the average hiring speed across all Greenhouse customers as well as the hiring speed for your specific organization. If your company is typically much faster or slower than the average hiring speed of our other customers, this factor will be weighed more heavily in the equation. That means:

  • If your organization doesn’t have a lot of data (you just joined Greenhouse or haven’t hired much), we can still make predictions based on the average data from all organizations.
  • However, the more data you have, the more fine-tuned the predictions will be to your company. Your predictions should get better and better over time as we get to know your hiring habits.

Next, we run this equation hundreds of times to create a histogram of data. A histogram for a single job might look something like this:

 Screen_Shot_2018-04-17_at_2.16.50_PM.png

Each bar represents a prediction. For example, the bar on the far left might be a prediction that you’ll make a hire next week. And the bars all the way on the right might be predictions saying that you’ll make a hire 30 weeks from today. Neither of those seem very likely taken on their own.

We add all these predictions together and take the mean (the top of the curve), which is the most likely hire date. We then display this mean value, or average, on the job dashboard as our predicted offer acceptance date.

 

How predictions change over time

We throw a number of factors into an equation to make a prediction. Some of these factors include:

  • Number of candidates in each milestone
  • Candidate source
  • Days the job has been open
  • Days since last hire was made on the job (if any)
  • Candidate offer status (if any)
  • Your company’s historical time-to-hire

Each of these factors is given a certain value and weight in the equation. When any of these factors change, it’s possible that the prediction might, too.

However, each of these inputs might have a small effect on the overall prediction because we’re looking at so many different combinations of the data, so making one change (e.g. adding a new candidate with a high quality source) might not shift your prediction.

 

How specific predictions are generated 

It’s not possible to tell you exactly how we made a prediction or how each candidate in your pipeline contributed to it.

A number of these factors are evaluated by a machine learning process that makes a final prediction. Machine learning allows computers to recognize patterns, make data-driven decisions, and learn from new data -- meaning that they will improve over time as they see new data.

Machine learning can find complicated relationships that are not always easy to explain, understand, or untangle. However, by training the machine learning process on huge amounts of data (think hundreds of thousands of jobs and candidates), we can trust the decisions the computer makes by comparing predictions to actual hire dates.  

 

How Greenhouse Predicts works: Technical explanation 

Greenhouse Predicts uses a Bayesian linear regression model. The linear regression equation looks something like this:

Y=A * X + Borg + Beveryone

  • Y is the logarithm of the time to next hire
  • X is a set of inputs, including things like: number of candidates by milestone and source, days since last hire or days the job has been open, offer status (unresolved, accepted, rejected, deprecated)
    • For example, number of candidates in Milestone #4 or candidates where source is referral)
  • A is a corresponding set of coefficients for those inputs. These coefficients are determined by machine learning and by training the model with our data set.  
  • B is an intercept. Borg  is the organization specific intercept (e.g. all pipelines for an organization move faster or slower relative to the average). Beveryone is an intercept that applies to all organizations equally.

The Bayesian part of the model means that instead of calculating single values for the coefficients, we generate probability distributions over values for those coefficients including the intercept. Your organization’s values could deviate from the average of all organizations. For example, Company ABC with a faster hiring speed might have a Borg that is smaller than average.

Once we have probability distributions over these coefficients, we can generate random outputs (Y) for specific inputs (X).

A histogram of results will be returned. We then take the mean of those samples of Y-values to generate a prediction.

Have more questions? Submit a request