Talent Matching - Data Processing FAQ

This FAQ provides you — the customers acting as data controllers — with technical and processing information about Greenhouse's Talent Matching feature to support your evaluation of the Talent Matching feature. This document additionally provides the technical foundation you need to conduct data processing assessments as you determine may be appropriate.

None of this is intended to be legal advice, neither are any of the answers provided intended to override or contradict the advice of your legal counsel.

Feature overview

What is the Talent Matching feature?

Talent Matching is an AI-powered feature within Greenhouse Recruiting that helps recruiters identify suitable candidates by analyzing resumes and matching them against the job requirements specified by recruiters. The system extracts skills, job titles, years of experience, start/end dates of employment, and company names from employment history from candidate resumes and uses advanced AI algorithms to compare these with job criteria specified by recruiters.

What technology powers the Talent Matching feature?

Greenhouse uses a series of fine-tuned LLM models, each one trained for a specific extraction task. Instead of relying on a single large model to do everything, Greenhouse breaks down the problem into smaller, specialized components. The system also uses third-party models such as OpenAI.

Data processing details

What are the roles of the parties under data protection laws? 

Greenhouse is a data processor acting on behalf of its customers to process the personal data relating to customer employees and candidates. 

The customers are the data controllers, who decide why and how the personal data is processed.

What data sources does Talent Matching access?

The source of the data will be the candidates themselves through their submitted resumes. The system extracts certain categories of personal data from the resumes. Currently, the algorithm does not access internal Greenhouse data sources.

What specific personal data does Talent Matching process?

The categories processed by Talent Matching are: Skills extracted from resumes, years of experience, job titles, start/end dates of employment, and company names from employment history.

Does the system process special category data?

The system does not process special category data (e.g., data relating to racial or ethnic origin, political opinions, religious beliefs, trade union membership, generic data, biometric data, data concerning health or sex life or sexual orientation). This information is not processed by the system or used in the AI algorithm. To prevent inadvertent extraction of special category data, each fine-tuned model is restricted to a narrow scope (skills, job titles, years of experience, start/end dates of employment, and company names from employment history). The models are not trained to recognize or extract personal attributes such as health data, religious affiliation, political opinions, or demographic markers.

Does the system process structured application data?

The matching algorithm is only using data coming from the resume. Greenhouse parses the resumes with ML models and extracts skills, job titles, years of experience, start/end dates of employment, and company names from employment history as structured data. Currently, structured application form data is not included in this calculation.

Data retention and storage

How long is parsed resume data cached?

Parsed resumes are cached for 30 days.

What happens to data when candidates are deleted?

When a candidate application or candidate record is deleted, all parsed and embedded data relating to that record will also be deleted.

How are match results retained?

We have rules in place that will retain the match results and reasoning on the candidate the moment their application is rejected, this information is always available in the candidate packet until anonymized or deleted.

What are your data retention obligations as controller?

Customer Action: Your company must establish its own data retention policies in accordance with applicable laws and your business needs, as recommended by your legal counsel. Consider local employment law requirements, candidate expectations, and your legitimate business interests in retaining candidate data for future opportunities.

Third-party processing

What data is shared with third parties?

Skills, job titles, years of experience, start/end dates of employment, and company names from employment history, parsed from the candidates' resumes, is sent for external processing to third-party services (OpenAI).

What protections exist for third-party processing?

Greenhouse has executed a DPA with OpenAI that flows down all of Greenhouse's data protection obligations to OpenAI, prohibits OpenAI from using personal data for their own purposes (including training/re-training datasets), and ensures that data is adequately protected and secured.

Is customer data used for model training?

Greenhouse is not training on customer data related to this feature. Personal data is not used to train our or third party’s models.

International transfers

How are international data transfers handled?

Greenhouse is certified to the EU-U.S. Data Privacy Framework and the UK Extension to the EU-US. Data Privacy Framework. Any cross-border transfers of personal data from the EEA or the UK are subject to the EU-U.S Data Privacy Framework or the UK Extension to the EU-US. Data Privacy Framework. Greenhouse secondarily relies on Standard Contract Clauses and additional safeguards for data transfers.

Security measures

What encryption protects candidate data?

Customer data is encrypted in transit between customers and Greenhouse using Transport Layer Security (TLS) 1.2 or higher. 

Customer data is encrypted at rest using a minimum Advanced Encryption Standard (AES) 256-bit encryption.

What technical and organizational measures are utilized to protect candidate data?

Greenhouse implements technical and organizational measures as described in Annex II of the Greenhouse DPA. Furthermore, Greenhouse is regularly audited against ISO 27001, ISO 27701 and SSAE 18 SOC 2 standards by independent third-party auditors. More information about Greenhouse’s security program and practices is available at the Greenhouse Trust Portal.

What access controls are in place?

Access to candidate match results is accessible to Greenhouse production engineers in order to enable Greenhouse’s provision of our services. Access to this data is restricted with the same permissions required to gain access to any other Greenhouse customer production data. Greenhouse technical support users would not have access to this data unless a customer grants access via a 'Temporary Access Grant' and the support analyst/engineer logs in as an employee.

What API security measures exist?

The internal services used to power the Talent Matching feature is on a private network to prevent internet-facing traffic. All internal services used to power the feature require authentication.

What AI-specific security measures are implemented?

Recognizing the unique challenges posed by AI systems, we also adhere to additional Generative AI security best practices. These practices include mitigations for vulnerabilities such as prompt injection, output handling, denial of service, and data leakage.

Individual rights implementation

How can candidates exercise their data subject rights?

Customer Action: You must establish processes for candidates to exercise their rights under applicable data protection laws, as advised by your legal counsel. This should include clear contact information and procedures for handling access, correction, deletion, and objection requests related to AI processing.

How can candidates access their AI-generated results?

If a candidate requests access to their personal data, you can export a candidate packet which includes the AI rating assigned at the time of their advancement/rejection and reason in the exported candidate packet.

Can candidates request correction of their AI matching results?

Customer Action: As the data controller, you must choose how candidates should contact you to exercise any rights under applicable data protection laws, as advised by your legal counsel. 

In terms of product functionality, users can manually override a Talent Matching categorization with one of their choosing based on manual assessment.

How are deletion requests handled?

Customer Action: As the data controller, you must choose how candidates should contact you to exercise any rights under applicable data protection laws, as advised by your legal counsel. 

In terms of product functionality, when a candidate’s personal data is deleted, all parsed and embedded data relating to that record will also be deleted along with the Talent Matching result if the user has marked the “Match Score and Reasoning” checkbox among the fields to be included in the deletion. 

Furthermore, if the entire candidate record is deleted, all related parsed and embedded data will also be removed.

What opt-out mechanisms exist?

Customer Action: You should implement processes for candidates to opt out of AI processing if required by applicable law or your privacy policies, as advised by your legal counsel. Consider providing clear notice about AI use and opt-out options during the application process.

Automated decision-making

Does Talent Matching make automated decisions about candidates?

No, the use of Talent Matching is not an automated decision under GDPR Article 22. The Talent Matching feature does not have the ability to make a decision, as it simply extracts skills, job titles, years of experience, start/end dates of employment, and company names from employment history from candidate profiles to match them to job criteria that have been previously selected by recruiters. The customer actively directs the operation of the Talent Matching feature, with human recruiters or hiring managers establishing the relevant recruitment criteria. The recruiter must make the decision to advance, reject, or add notes about a candidate.

What human oversight mechanisms are in place?

Humans are required to make the decision for each candidate. There is no mechanism that automatically takes action to advance or reject a candidate based on an AI rating. The system is designed as a decision-support tool, not a decision-making tool. The AI categorization is only used during the initial application review stage and is not visible later in the process to prevent biasing interviewers.

How does the system provide transparency to recruiters?

The system provides transparency through: 

  • A visual representation of the candidate's resume wherein matched terms are highlighted (either as an exact match or a semantically similar match)
  • A short summary justifying the candidate's match result
  • A listing of the candidate's matched and missing skills, based on the calibration
  • A list of other candidate skills that were extracted during the parsing step, but not included in the calibration

Bias mitigation and fairness

What bias mitigation measures are implemented?

Greenhouse has implemented several measures: 

  • The system does not use names or any contact information for the matching algorithm to avoid bias.
  • If a recruiter enters a biased or soft skill during the calibration process, the system flags the issue and warns the user.
  • In extreme cases, they are blocked from saving the calibration.
  • Greenhouse partners with an independent  third party (WardenAI) to conduct regular bias audits.

How frequently are bias audits conducted?

The bias audit will be conducted every time Greenhouse releases a new change to the algorithm. In addition, Greenhouse utilizes a third-party vendor, WardenAI, to conduct monthly bias audits.

What criteria determines biased skills during calibration?

Each entered skill is assessed according to an LLM and returns a value, looking for biased terms, soft skills, skills used as proxies for protected traits, nonsensical answers, and system injection functionality.

Are bias audit results available?

The third-party bias audits conducted regularly by WardenAI are publicly available and demonstrate that the algorithm performs consistently across various demographic groups.

Technical specifications

What matching thresholds are used for candidate categorization?

Each assessed candidate is assigned a "match strength" result, categorized as Strong Match, Good Match, Partial Match or Limited Match, as well as “Needs manual review” based on if the candidate does not have a resume to evaluate.

How does the system handle synonyms and alternative terminology?

Greenhouse uses embedding representation of skills and job titles so that we can run a semantic search on them. For example, for “software engineer” and “web developer,” both of these titles will return a match for someone searching for a software developer.

What happens when the system cannot process a resume?

In cases where the system is unable to process a resume, the candidate is surfaced to the recruiter for manual review.

How is algorithm performance measured?

The performance of the matching algorithm is evaluated by back-testing its predictions against the evaluation set. This involves measuring accuracy by examining how well the categorizations made by the matching algorithm for a candidate correlate with a candidate successfully passing the application review stage. We evaluate skills extraction using a fixed set of resumes that have been labeled with help from a language model. No customer data is used in this evaluation process.

Legal basis and compliance

What legal basis should I use for processing?

Customer Action: As the data controller, you should make this determination in coordination with your legal counsel. You should determine your own lawful basis for processing candidate data through Greenhouse, including the Talent Matching feature. Consider whether you will rely on legitimate interest (conducting a balancing test), explicit consent from candidates, or contractual necessity. Your choice will affect notice requirements, individual rights, and opt-out obligations.

What are my obligations regarding data subjects?

Customer Action: As the data controller, you should make this determination in coordination with your legal counsel. You may need to identify your specific data subject categories (job applicants in your geographic regions) and assess the volume of personal data you expect to process. Consider local employment laws, candidate expectations in your industry, and any specific requirements for your geographic markets.

What notice must I provide to candidates?

Customer Action: As the data controller, you should make this determination in coordination with your legal counsel. You may need to provide clear notice to candidates about AI processing before they submit applications, as advised by your legal counsel and as required by applicable laws. This generally includes the purpose of AI processing, their rights, and how to exercise them. Consider implementing job board disclosures and updating your privacy policy to address AI use in recruitment.

Greenhouse provides a settings page to allow you to configure a custom AI disclaimer for job posts.

Risk assessment report

What risks should I document in my assessment?

Customer Action: As the data controller, you should make this determination in coordination with your legal counsel. Consider risks specific to your organization including: accuracy of AI categorization for your candidate population, potential bias in your specific use case, candidate awareness and consent, data security in your environment, compliance with local employment laws, and impact on candidate experience. Assess likelihood and severity based on your specific context.

What safeguards should I implement?

Customer Action: As the data controller, you should make this determination in coordination with your legal counsel. Based on your risk assessment, consider implementing: clear AI disclosure processes, human oversight requirements for your recruiters, bias monitoring procedures, candidate opt-out mechanisms, data retention policies aligned with local laws, and staff training on AI-assisted recruitment.

Monitoring and governance

What ongoing monitoring should I implement?

Customer Action: You should make this determination in coordination with your legal counsel. Establish processes to monitor: recruiter compliance with human oversight requirements, candidate feedback and complaints, accuracy of AI categorizations for your specific roles, any patterns suggesting bias in your hiring outcomes, and compliance with your established data retention policies.

How often should I review my assessments?

Customer Action: You should make this determination in coordination with your legal counsel. You may want to schedule regular assessment reviews based on: changes in your hiring volume or geographic scope, updates to applicable data protection laws, feedback from candidates or regulators, and at minimum annually to ensure continued compliance.

Documentation and audit support

What documentation should I maintain?

Customer Action: You should make this determination in coordination with your legal counsel. Typically, records will include: your assessments and risk assessment, legal basis determination and balancing test (if applicable), candidate notices and consent mechanisms, staff training on AI use, any bias monitoring or testing results, candidate rights requests and responses, and data retention and deletion activities.

Additional technical information for customer reference

Algorithm performance monitoring

Greenhouse has implemented comprehensive monitoring systems including alerts for each component of the algorithm connected to observability systems designed to page on-call engineers in case of failures. The company has defined baseline thresholds on the accuracy of the separate parts of the algorithm and has an internal process in place to not push any changes unless these thresholds are passed.

Bias audit availability

Third-party bias audits conducted regularly by WardenAI will be publicly available and demonstrate that the algorithm performs consistently across various demographic groups, with no statistically significant bias detected. Documentation from these ongoing bias audits is maintained and can be used for regulatory audits.

Data processing limitations

The system is explicitly designed with data minimization principles - it does not extract additional fields beyond skills, job titles, years of experience, start/end dates of employment, and company names from employment history during the resume parsing process. Each fine-tuned model is restricted to a narrow scope and not trained to recognize or extract personal attributes such as health data, religious affiliation, political opinions, or demographic markers.