If you have ever seen or filled out a scorecard in Greenhouse Recruiting, you likely noticed our unique measurement scale for assessing candidate attributes and performance. Our teams are regularly asked about our rationale behind a combination of colors and emojis instead of a traditional 5-point scale (i.e., 1 to 5, Strongly Disagree to Strongly Agree), and are met with a variety of feedback on our choice.


Our choice of measurement scale here was far from random — in fact, our choice was heavily informed by research in the field of psychometrics. Psychometrics is the study of quantitative measurement practices in the social sciences. A psychometrician generally researches best practices in evaluating the quality of metrics (i.e., survey items), measurement scales (i.e., Strongly Disagree to Strongly Disagree), and other related factors that contribute to the accuracy of the behavior or process you’re trying to capture.1

In this article, we will introduce our justification for excluding numbers, text/worded scales, and our ultimate choice to use color and symbolic scales in Greenhouse Recruiting scorecards. Additionally, this article will share some interesting facts about how people tend to respond to surveys with only the smallest tweaks in measurement scales.


Numerical Scale's Impact on Question Responses 

Have you noticed the wide variety of numerical scales used in survey questions to represent the same set of response choices? The following 5-point Likert scale is a common set of choices available as a range of answers to a survey question:


And so is this variation of the same scale:


Several studies on the use of Likert scales have demonstrated that people’s responses will vary based on the numbers shown (or not shown) to survey participants.2 Survey participants tend to evaluate and assign different weights to numbers that may introduce bias in any survey scale. Traditional academic research tends to collect data from larger sample sizes (i.e., 200–300 respondents) which helps balance out the variation in response rates from people’s interpretation of a scale. These sample sizes are generally much higher than you will see in the hiring process — as a Hiring Manager, you will likely receive candidate ratings from 5 to 10 people, and even fewer responses on each of the individual attributes to be evaluated. Any step Greenhouse Recruiting can take to reduce error and bias improves the quality of your candidate evaluations.


Cultural and Regional Response Differences to Worded Measurement Scales

Did you know that residents of the United States tend to respond overwhelmingly positive to most survey questions they receive than many other parts of the world?3 These differences in survey responses do not reflect any cultural differences in optimism or agreement — in fact, people in the United States tend to be less positive and trusting of institutions as a whole.4 In the U.S. respondents likely respond to a question as Strongly Agree unless they have a clear reason to disagree with the statement being made. In other parts of the world, such as in mainland China,3  respondents will respond more neutrally unless they have a clear justification for strongly agreeing with the statement.

Excluding a worded scale ensures that your Hiring Managers can be confident that your candidate is being evaluated consistently across interviewers, and is not subject to bias associated with different individuals’ interpretations of Strongly Agree/Strongly Disagree.


Color Heuristics vs Words 

People tend to respond more consistently and powerfully to color heuristics than they do to words. Humans process colors & symbols in a different region of the brain than words or numbers.5 We also more readily associate colors with a negative or positive valuation based on opacity, etc.6 By using shades of red, yellow, and green as heuristics to evaluate your candidates, Greenhouse Recruiting hopes to provide you with more consistent results across individuals and teams as you make your hiring decisions.

Ultimately, our scorecard rating setup was created in the interests of providing customers with the best possible platform for reducing bias and individual differences. We’re always open to feedback and evidence-based approaches to better reaching that goal.



1. Psychometric Society: What is psychometrics? 

2. Weijters, B., Cabooter, E., & Schillewaert, N. (2010). The effect of rating scale format on response styles: The number of response categories and response category labels. 

3. Lee, J. W., Jones, P. S., Mineyama, Y., & Zhang, X. E. (2002). Cultural differences in responses to a Likert scale. Research in nursing & health, 25(4), 295–306.

4. Twenge, J. M., Campbell, W. K., & Carter, N. T. (2014). Declines in trust in others and confidence in institutions among American adults and late adolescents, 1972–2012. Psychological Science, 25(10), 1914–1923. 

5. Peterson, Bradley S., et al. “An fMRI study of Stroop word-color interference: evidence for cingulate subregions subserving multiple distributed attentional systems.” Biological psychiatry 45.10 (1999): 1237–1258.

6. Piotrowski, C., & Armstrong, T. (2012). Color Red: Implications for applied psychology and marketing research. Psychology and education-An Interdisciplinary Journal, 49, 55–57.