Sorting Out the Junk Science in Psychometrics

Valid psychometrics are highly beneficial tools. By helping us objectively measure and describe human attributes, they offer us important insights into candidates and employees, which are otherwise very difficult to pull from either interviews or work observations. By comparing test results to accurate job profiles or benchmarks, potential context-relevant job behaviors can also be predicted. And if those psychometric tools use a normative design to compare respondents to a broader population, as they should do if used for decision-making applications, they further enable us to more precisely compare candidates.

Today, under the growing influence of big data and with AI (machine learning) knocking on the door, it is not surprising that some test vendors are seeking competitive advantage with these and other fascinating scientific developments. Behavioral economics, being all about numbers and predictability, certainly holds the promise of some exciting changes in psychometrics, but it is important that we distinguish what are real advances from the mere illusions of junk science.

One constant over the years is that we all look for silver bullets to simplify our decisions. However, not all decisions can be simplified and not all simplifying solutions are what they are touted to be. We live in a world of exaggerated claims and all sorts of products and services fall short of their marketing hype. In some areas that’s no big deal, but with people decisions, when careers and management plans are on the line, the consequences and costs can be very significant.

The issue is all about prediction. How precisely and accurately can we use a psychometric to predict job fit, behavioral differences among people, or job performance. Since more and more test vendors are claiming they can provide these answers, it’s worthwhile to take a hard look at the veracity of these claims.

Junk Science in Psychometrics

Actually, junk science crept into the field of psychometrics many years ago, but we just never called it that. The most obvious example is the deceptiveness of face validity. Any vendor website insinuating that a sampling of test respondents who are in high agreement with their test results connotes meaningful validity is either ignorant or deceptive. Reading the works of Dan Ariely or Daniel Kahneman on the irrationality of behavior will make it very clear that validity has no relationship with agreement or personal likes and dislikes. Validity is a statistical measure not an emotional one, and emphasizing face validity, which is not validity at all, likely indicates that the vendors don’t have real validation or certainly don’t want you to see what numbers they do have. Take a pass!

Another scientific stretch has to do with the job profiles that many tests use in a rather absolutist way. Job profiles are critical to accurate decision-making, but with too many tests, a simple set of generic templates or stereotypes replace actual job analysis. Having undertaken literally thousands of job analyses over the years, many employing content and criterion validation methodology, there’s simply no question that one size does not fit all.

Job Analysis Should Be a Context-Relevant Process

For the more complex jobs for which behavioral assessments are generally used, job analysis should be a context-relevant process taking into account unique situational variables, for example, the personalities of the “boss” and the other people involved, along with variations in cultures, performance standards, job expectations, management styles, training, quality of supervision, etc. In our experience, we have found numerous instances where, because of such variabilities, seemingly similar positions in different organizations required very different personalities. Granted, some jobs can be cloned, but they tend to be task and specialist roles and/or entry level, including service functions, non-transactional retail sales, data entry, and reporting roles, etc.

Getting the job right – understanding both nuanced and unique factors that drive performance – is at least 50% of a selection decision. But all too frequently, simple assumptions and inference supplant what should be a thorough analytical process, resulting in inaccurate job profiles that lead to flawed candidate searches.

Test Design Matters

Fitness for purpose is yet another area where claims are sometimes misleading. Such is the case with normative and ipsative tests. Simply stated, a normative instrument uses a questioning format (for example, yes and no response variations) that enables norms for some population to be developed and individual responses compared to those norms. One candidate might score at the 80th percentile in a certain construct and another at the 40th percentile, so we have a statistical basis for comparing the two people.  The variance in their scores further provides us with a means of determining how their behaviors would differ in specific situations. Since the intent in using a psychometric in decision-making applications is to objectively compare people, this manner of test design is essential in applications such as hiring, internal placements, and succession planning.

Ipsative instruments use a different questioning format and are intended for different purposes. Using variations of forced choice questions (for example, Most or Least like me), MBTI, and the many versions of DISC, provide only a relative indication of traits or attributes as opposed to a score measured against a statistical norm. Ipsative means self-referent, which translates to using oneself rather than others or a defined population as a norm. So, although ipsative tests indicate how one individual prefers to respond to problems or people, etc., they offer no meaningful correlation of comparative strength or visibility of traits when attempting to compare that person to another. If a respondent scores high in dominance, for instance, that simply means that dominance is a more prominent behavioral factor than the person’s other traits, but it cannot be said that the person is more or less dominant than someone else with a similar test configuration.

Suitable for coaching or other self-awareness applications where comparisons to others are unnecessary, ipsative tools are neither designed for nor adequate for decision-making purposes like hiring. But in the marketplace, it’s a case of the blind leading the blind: many vendors either do not understand or choose to just ignore the limitations, and buyers do not really understand what they are buying or using.

Knowing how excited people are to get on the big data train and find that silver bullet, the latest trend is for vendors to attempt to translate what is essentially descriptive data into a single, simplifying comparative number. One vendor, for example, claims that they can provide a number score showing how each candidate compares to the job. It sounds good, and it may attract some buyers, but claiming a degree of predictive precision that is not psychometrically possible is a real stretch of the imagination.

False Assumptions Result in Inaccurate Results

The problem is with the assumptions that are being made about the data being used, all of which have no margin for error. The first assumption is that the job benchmark is accurate and complete. We know that if it’s a standard job template or a stereotype rather than a context-relevant creation, the target is questionable and might even be way off the mark. How meaningful is a predictive value if the candidates are being compared to the wrong target information?

The second assumption, also about the job profile being used, is that in its entirety it captures what is behaviorally significant in the job. The reality is that this is very unlikely. Over the years we have undertaken scores of criterion studies on diverse jobs and in these analyses, we correlate test constructs with the objective performance data for a group of people. We generally find anywhere from one or two to maybe a handful of statistically significant correlations out of almost 50 possibilities. So, whereas individual traits or combinations of several traits may be predictive of some aspects of performance, the entire personality syndrome is not. Thus, comparing a candidate’s test results to a behavioral profile, even if it is accurate, means comparing characteristics that may have little or no relevance to actual performance, and which may actually run counter to the several characteristics that actually do matter. The bottom line is that the predictive value assigned to that candidate’s results may be attenuated by other characteristics that have little bearing upon job performance!

The third assumption tends to gloss over the fact that psychometrics, at best, is an imperfect science, and there are practical limits to what can be predicted and how precise the prediction can be. Start with the well-accepted general assumption that behavior (traits, whatever) account for maybe 40% of performance variance in most jobs. That variance factor can be lower in some jobs, for example a nuclear physicist, and higher in others, for example retail sales. So, behavior is an important decision-making consideration, but it cannot stand on its own.  Even the most positive or negative potential effects can be countered by such factors as knowledge and skill, cognitive ability or intelligence, attitudes, as well as physical and emotional constraints. Factor in the effects of randomness, which is always a consideration in measuring human abilities, and you then realize how unrealistic and unstable any specific number might be. A more plausible approach would be to use ranges of compatibility, for example, highly compatible or low compatibility, because that is about as close to the target as you can reasonably get.

As I stated right at the top of this piece, psychometrics can be very informative and very beneficial in so many applications, but they need to be used the way they are intended to be used and within a framework of reasonable expectations.

With OMS you can gain a competitive edge by combining personal decision-making skills and know-how, scientific measurement techniques, and web-based organizational diagnostic tools into a comprehensive decision-making system for all your managers. With OMS your executive team can develop strategic initiatives far more likely to succeed, and make faster, better-informed operating decisions leading to higher individual and group performance, greater retention, and lower costs. Learn more: http://2oms.com/start/

One thought on “Sorting Out the Junk Science in Psychometrics”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.