Personality Tests for Hiring: What HR Needs to Know

Personality Tests for Hiring: What HR Needs to Know

Roughly 80% of Fortune 500 companies use some form of personality assessment in their hiring process. Most of them are using the wrong one. They are paying for tests built on models that industrial-organizational psychology abandoned decades ago, collecting data that does not predict what they think it predicts, and exposing themselves to legal challenges they do not know exist.

The gap between what personality science knows and what HR departments actually do is enormous. Here is what the research says, what holds up in court, and what the data actually predicts about who will perform in a role.

What Personality Tests Actually Predict

The central question in hiring assessment is validity: does the test predict job performance? For personality tests, the answer is yes, but with important caveats about which tests, which traits, and which jobs.

Meta-analyses covering hundreds of thousands of employees across industries consistently show that specific personality traits predict specific job outcomes. Conscientiousness predicts task performance across virtually all jobs. Agreeableness predicts teamwork and customer service performance. Emotional Stability (low Neuroticism) predicts performance under pressure. Extraversion predicts success in sales, management, and client-facing roles. Openness predicts performance in creative and research positions.

The predictive power is not hypothetical. The correlation between Conscientiousness and overall job performance (r = .22 to .27 across meta-analyses) is comparable to the correlation between job interviews and performance. When you combine personality data with cognitive ability tests, the prediction improves beyond what either measure achieves alone. You are not replacing the interview. You are adding a dimension of information that interviews systematically miss.

What interviews miss is precisely what personality tests catch: stable behavioral tendencies that show up after the first three months, once the candidate stops performing and starts being themselves. The interview tells you who the person is trying to be. The personality assessment tells you who they will be once the impression management fades.

Which Model Matters: Big Five vs. Everything Else

Not all personality tests are equal, and the model underlying the test determines whether the results mean anything.

The Big Five (OCEAN) model is the standard in industrial-organizational psychology for one reason: it is the only model with consistent, replicated predictive validity for job performance. It measures five continuous dimensions (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism), each broken into six facets, producing a 30-point profile that maps to specific work behaviors.

MBTI is the most widely used personality tool in corporate settings and has the weakest scientific support. It sorts people into 16 types using binary categories (you are either an Introvert or an Extravert, never both). The test-retest reliability is poor: up to 50% of people get a different type when retested. More importantly, MBTI types do not predict job performance. The publisher's own manual acknowledges this. Companies use it because it is familiar, not because it works.

DiSC measures four behavioral styles (Dominance, Influence, Steadiness, Conscientiousness). It is useful for team communication workshops but was not designed for hiring and has limited predictive validity for job performance. Using DiSC to screen candidates is like using a thermometer to measure blood pressure. It measures something real. It just does not measure what you need.

CliftonStrengths identifies "talent themes" from a fixed list of 34. It is a development tool, not a selection tool. Gallup explicitly states it should not be used for hiring decisions. Companies that use it in hiring are misapplying the instrument and creating legal exposure.

The Enneagram has no peer-reviewed validation studies supporting its use in employment selection. It is a spiritual and self-development framework. Using it in hiring is not just ineffective; it is indefensible if challenged.

If you are making hiring decisions based on personality data, the model must be the Big Five or a well-validated derivative of it. Everything else is corporate entertainment.

Conscientiousness Is the Strongest Predictor

Across all job types, all industries, and all levels of seniority, Conscientiousness is the single strongest personality predictor of job performance. This finding has been replicated so many times that it is no longer debated in the research literature.

Conscientiousness predicts:

The six Conscientiousness facets (Self-Efficacy, Orderliness, Dutifulness, Achievement-Striving, Self-Discipline, and Deliberation) each predict different aspects of work behavior. A salesperson needs high Achievement-Striving but may not need high Orderliness. An accountant needs high Orderliness and Deliberation but may not need high Achievement-Striving. Domain-level Conscientiousness tells you the person is generally reliable. Facet-level Conscientiousness tells you which specific work behaviors you can expect.

The Facet-Level Advantage

Most hiring assessments measure the Big Five at the domain level: five scores, five numbers. This is better than nothing but misses most of the actionable information.

Consider two candidates who both score at the 70th percentile on Extraversion. At the domain level, they look identical. At the facet level, one might score high on Warmth (E1) and Positive Emotions (E6) but low on Assertiveness (E3) and Excitement-Seeking (E5). The other might score high on Assertiveness and Activity Level (E4) but low on Warmth and Gregariousness (E2). The first candidate is a natural customer service representative. The second is a natural project leader. Same Extraversion score, completely different behavioral profiles, completely different role fit.

The same applies to every domain. Two people with identical Neuroticism scores can differ on whether their instability manifests as anxiety (N1), anger (N2), depression (N3), self-consciousness (N4), impulsivity (N5), or vulnerability to stress (N6). A sales manager with high N2 (Anger) creates a hostile team environment. A sales manager with high N4 (Self-Consciousness) over-prepares and micromanages presentations. Both show up as "high Neuroticism." The interventions are entirely different.

A 30-facet assessment gives you the resolution to match candidates to specific roles, predict specific friction points with specific teams, and identify specific development areas before the person starts. A 5-domain assessment gives you a blurry approximation of the same information.

What Personality Tests Do Not Measure

Personality assessments measure stable behavioral tendencies. They do not measure intelligence, technical skill, domain knowledge, or motivation for a specific role. They do not tell you whether someone can write code, manage a P&L, or operate machinery. A highly Conscientious person with no accounting knowledge will not be a good accountant.

Personality assessments work best as one component of a structured hiring process that also includes cognitive ability testing, structured interviews, work sample tests, and reference checks. The research is clear: multi-method assessment predicts job performance better than any single method alone. Personality data adds incremental validity above cognitive ability and interviews. It does not replace them.

Companies that rely on personality tests as the sole screening criterion are misusing the tool. Companies that ignore personality data entirely are leaving predictive power on the table.

Personality assessments used in hiring are subject to employment law, including Title VII of the Civil Rights Act (US), the Equality Act (UK), and equivalent legislation in other jurisdictions. The legal standard is straightforward: any assessment used to make employment decisions must be job-related and consistent with business necessity.

What makes a personality test legally defensible:

What gets companies sued:

Adverse Impact and Fairness

Adverse impact occurs when a selection procedure disproportionately excludes members of a protected group. The Big Five has a significant advantage here: it shows substantially less adverse impact across racial and ethnic groups than cognitive ability tests.

Cognitive ability tests produce large group differences (roughly one standard deviation between Black and White test-takers in US samples). Big Five personality tests produce small to negligible group differences on most dimensions. This means personality assessments can add predictive validity to a hiring process while reducing rather than increasing the overall adverse impact of the selection system.

Gender differences exist on some Big Five dimensions (women score slightly higher on Agreeableness and Neuroticism on average), but the differences are small enough that they rarely produce adverse impact at the selection thresholds used in hiring. If your cutoff scores are producing gender-based adverse impact, the cutoffs are almost certainly set incorrectly.

Age effects are minimal. Unlike cognitive ability, which peaks in early adulthood and declines, personality traits are relatively stable across the working lifespan. Conscientiousness actually increases slightly with age, which means personality assessments do not disadvantage older workers.

The fairness profile of Big Five assessments is one of the strongest arguments for including them in a hiring process. They add predictive validity without the adverse impact costs associated with cognitive testing.

How Candidates Game the Test (And Whether It Matters)

The most common objection to personality testing in hiring is faking: candidates will present themselves in the most favorable light rather than answering truthfully. This concern is legitimate but overblown.

Candidates do shift their responses toward what they believe the employer wants. Research shows that coached or motivated respondents can increase their Conscientiousness and Agreeableness scores by roughly half a standard deviation. This is not nothing. But several factors limit its practical impact.

First, the shift is uniform. Almost everyone inflates the same traits. This means the rank ordering among candidates is largely preserved. The person who is genuinely the most Conscientious in the applicant pool still tends to score the highest, even when everyone is inflating. Faking adds noise but does not destroy the signal.

Second, people who successfully fake high Conscientiousness tend to actually be somewhat Conscientious. The ability to read the situation, identify the desired response, and consistently maintain that presentation across 120 questions requires exactly the kind of self-regulation and goal-directed behavior that Conscientiousness measures. Faking Conscientiousness well is, to some degree, an expression of Conscientiousness.

It is worth noting that faking is not unique to candidates. Research on emotional intelligence assessments found that informant ratings (colleague or manager reports) can show halo-effect inflation that exceeds candidate self-inflation. Self-reports may, in some contexts, be more honest than the third-party alternatives meant to replace them.

Third, forced-choice formats (where candidates rank statements against each other rather than rating them independently) substantially reduce faking because there is no uniformly "correct" answer. When every option sounds desirable, the candidate is forced to reveal genuine preferences.

The practical conclusion: faking is a real phenomenon that slightly reduces the precision of personality measurement. It does not invalidate the measurement. It does not justify abandoning personality assessment. It does justify using well-designed instruments with built-in validity scales and forced-choice items rather than transparent Likert-scale questionnaires where the desirable answer is obvious.

Role-Specific Profiles: One Size Does Not Fit

The traits that predict success vary by role. A blanket "we want high Conscientiousness and high Agreeableness" policy is better than nothing but misses the nuance that makes personality data actionable.

Sales roles benefit from high Extraversion (specifically Assertiveness and Activity Level), moderate to high Conscientiousness (Achievement-Striving matters more than Orderliness), and low to moderate Agreeableness. The last one surprises people. High-Agreeableness salespeople struggle to close because closing requires pushing past the customer's resistance, which feels like conflict. The best salespeople are warm enough to build rapport (moderate E1) and competitive enough to ask for the business (low A4).

Software engineering benefits from high Conscientiousness (especially Orderliness and Deliberation), high Openness (especially Intellectual Curiosity), and personality is less predictive of success than cognitive ability. The introversion stereotype is partially supported: high Gregariousness (E2) is mildly negatively correlated with individual contributor performance, likely because it correlates with time spent socializing rather than coding.

Customer service benefits from high Agreeableness (especially Compliance and Tender-Mindedness), high Emotional Stability (low N2 Anger is critical), and moderate Extraversion (Warmth matters; Assertiveness does not). Screening for low Neuroticism in customer service is one of the highest-ROI applications of personality testing because high-Neuroticism agents escalate calls, respond emotionally to difficult customers, and burn out faster.

Management and leadership benefits from a complex profile: high Assertiveness (E3), moderate to high Conscientiousness, moderate Agreeableness (not too high, not too low), low Neuroticism, and high Openness in environments that require change management. The most consistent finding in leadership research is that low Agreeableness combined with high Conscientiousness predicts who gets promoted and who is rated as an effective leader by subordinates. Leaders need to make decisions that disappoint some people. High Agreeableness makes that harder.

Creative roles benefit from high Openness (especially Fantasy, Aesthetics, and Novelty-Seeking), moderate Conscientiousness (enough to finish projects but not so much that risk-aversion kills innovation), and tolerance for ambiguity (which maps to low Deliberation, C6). Hiring for high Conscientiousness in creative roles can backfire. You get people who deliver on time but deliver predictable work.

Team Fit vs. Role Fit: Two Different Questions

Role fit asks whether the candidate's personality predicts success in the position. Team fit asks whether the candidate's personality predicts productive working relationships with existing team members. These are different questions with different answers.

A candidate can be an excellent role fit and a terrible team fit. Imagine hiring a highly Assertive, low-Agreeableness sales director into a team of highly Agreeable, conflict-averse account managers. The new hire will perform the sales director role well. They will also create friction with every person they manage, because their personality clashes with the team's baseline on exactly the traits that govern daily interaction.

Team fit analysis requires comparing the incoming candidate's profile against the existing team's profiles. The relevant questions are: where are the trait distances on the facets that predict interpersonal friction? Does the candidate's Conscientiousness level match the team's work style? Does their Agreeableness level match the team's conflict tolerance? Does their Activity Level match the team's pace?

The personality friction score provides a framework for quantifying these team-level mismatches. The most productive teams are not the ones where everyone is similar. They are the ones where similarity exists on the traits that govern collaboration (Agreeableness, Conscientiousness) and diversity exists on the traits that govern problem-solving (Openness, Assertiveness).

How to Implement Personality Testing Correctly

If you are adding personality assessment to your hiring process, the implementation matters as much as the instrument. Here is what the research and legal standards require.

Step 1: Conduct a job analysis. Before selecting an assessment, define the behavioral requirements of the role. What does success look like? What causes failure? Which personality traits map to those behaviors? This documentation is your legal foundation if the assessment is ever challenged.

Step 2: Choose a validated instrument. The assessment must be based on the Big Five model, use a validated item set (such as the IPIP-NEO or a commercially developed equivalent), and have published evidence of criterion validity for employment selection. Do not use assessments designed for self-development, clinical diagnosis, or team building in a hiring context.

Step 3: Administer consistently. Every candidate for the same role takes the same assessment under the same conditions. Standardization is both a legal requirement and a psychometric one. If conditions vary, scores are not comparable.

Step 4: Use profiles, not cutoffs. Binary pass/fail decisions based on personality scores are scientifically dubious and legally risky. Instead, generate a role-fit profile that shows how the candidate's facet scores match the job requirements. A candidate who scores below the ideal range on one facet but above on three others may still be the strongest overall fit.

Step 5: Combine with other data. Personality scores should inform the hiring decision alongside cognitive ability data, structured interview scores, work samples, and references. No single data source should be the sole basis for a hiring decision.

Step 6: Monitor for adverse impact. Track selection rates by demographic group. If personality scores are disproportionately screening out members of a protected group, investigate whether the cutoffs or weighting need adjustment. The Big Five typically produces less adverse impact than other selection tools, but monitoring is a legal and ethical obligation.

Next Steps

The 30-facet OCEAN personality test scores candidates on every subfacet of the Big Five in about 15 minutes. The basic results are free. For hiring, the hiring fit report compares a candidate's profile against role requirements and existing team members, identifying specific alignment and friction points across all 30 facets.

Take the OCEAN personality test

If you are evaluating personality assessments for your organization, start by having your existing high performers take the test. Their profiles become the benchmark. When you see the facet-level differences between your top performers and average performers, you will understand exactly which traits predict success in each role. That data, not intuition, is the foundation of a defensible and effective hiring process.