Beginning in the late 1970s, the government has tried to generate data to bolster the credibility of police officers when they offer opinions in court that a driver was impaired on alcohol or some other drug (DUI). This effort was lead by the National Highway Traffic Safety Administration (NHTSA).
The early efforts to develop roadside maneuvers were mostly honest experiments to give DUI officers tools to assist them in identifying drivers who were subtly impaired and might otherwise avoid arrest. The exercises were for the purpose of assisting officers in developing probable cause for arrest. The first study, self-published by NHTSA in 1977 was actually entitled, "Psychophysical Tests for DUI Arrest." Officers are never exposed to the actual data from this study in their DUI training, and they are never told that officers in that study made incorrect conclusions of impairment using the roadside maneuvers in almost half of all cases (see p.32).
The government's NHTSA self-published another paper on the DUI roadsides just a few years later. This report, entitled "Development and Field Test of Psychophysical Tests for DWI Arrest" didn't contain enough of the experimental data for validity claims to be peer reviewed, but the purpose of the exercises was properly labelled in the title of the report- the exercises were considered tools to make arrest decisions at roadside. Field tests useful in determining probable cause for arrest are referred to as screening tests. We'll come back to what a screening test is later in the paper.
The most recent NHTSA report that has considered the validity of the three test battery was self-published by NHTSA in 1998. Entitled, "Validation of the Standardized Field Sobriety Test Battery at BACs Below .10 Percent," this report contained some concerning data about the potential for false arrests using the sobriety tests. Officers are not exposed to in this questionable information in their DUI training.
*Note on why self-publishing is a problem
In the world of real science, factual assertions must meet
certain criteria before they can be credited as "scientific." One
important step is peer review. Peer review allows other experts
in the same field to examine the methodology and statistical
analysis to confirm valid methods were used to derive the
conclusions. Publication in a scientific journal then allows for
any research data that questions the conclusions of the authors
to publish data that provides the contrary data for
consideration by consumers of the information published in the
journal. NHTSA bypassed this entire quality control process in
self-publishing these reports from which police testify in court
to secure criminal convictions.
How the 1998 NHTSA Report Misinforms Officers
Officers are led to believe that the results of the roadside exercises were proven to be 91 %accurate in discriminating between drivers under and over .08 BAC. The 91% number generated in the data from this study represented the overall percentage of cases in which officers believed a driver to be over or under .08 and were correct. This is a descriptive statistic of the overall study results, not a reflection of roadside test accuracy in any individual case.
All of the below factors contribute to the NHTSA "91% accurate" number being false:
Outcome Variables Switched
The report clearly attempts to represent the accuracy of the roadside testing criteria. The 91% number is actually the ratio of cases in which the officers formed a correct opinion that drivers where under or over .08. In many cases, officer opinions were not based on test criteria and in some cases officer opinions disagreed with sobriety test scoring criteria. This was a deceptive shell game of substituting one outcome measurement and representing it as another.
The true data from the test results revealed that over all drivers who were under .08, 56% were incorrectly scored by the Walk & Turn test as being over .08. Of the One Leg Stand results, 41% of all drivers under .08 were incorrectly scored as failing the test. (see p.21) These numbers are hidden from officers in their training, and replaced with the "91% accurate" sugar coated lie.
* This is one of the exact problems that peer review would
address. Other scientists would recognize that the assertions in
the report conflict with the experimental data, and publication
would be refused.
Another misleading attribute to using an overall percent correct ratio (descriptive statistic) is that it is strongly influenced by the base rate of intoxicated subjects in the sample. In this study 72% of all drivers stopped were over .08. This imbalance in test subjects skews the results when the overall "correct rate" is listed. Even if every single sober person failed the roadsides, the tests would still appear to have a 72% accuracy rate even if the tests were 0% accurate on innocent drivers. This is how NHTSA lies to officers who have no clue of how to assess how NHTSA assertions are misrepresented.
Descriptive Versus Inferential Statistics
Yet another misdirection in the NHTSA literature leads every officer to believe that their use of the exercises will reproduce the accuracy derived from the study. IN the 1998 experiment, NHTSA used seven hand-picked highly experienced DUI officers in San Diego. If we ignore the first lie for a moment, and pretend the results of the roadsides were shown to be correct in 91% of cases in that study, it is furtherance of that a lie to tell every officer completing the training that they will be right in 91% of cases. The 91% figure is what is called a descriptive statistic. Such a figure only represents the results of the specific experiment at that time with the limited sample involved. In order for the result of a particular experiment to be generalizable to a larger population, that result must meet the criteria of a inferential statistic. The ability to draw inferences about a larger population and to generalize the results to any individual test results requires a large enough sample size that is representative of the larger population and randomization. The opinions of seven officers on a tiny sample of just over 200 drivers does not generate data inferable to every suspected DUI driver everywhere else in the country of any age and ability.
Screening Tests versus Confirmation Tests
The third lie is the most scientifically offensive. This lie comes from the use of exercises designed to develop probable cause for arrest - as forensically reliable evidence of impairment. Forensic science recognizes two categories of test validation: screening tests, and confirmatory tests. Police training materials properly label roadside exercises as the Pre-arrest Screening phase of a DUI investigation. This is an accurate characterization. Screening tests are supposed to be applicable in the field, before an arrest, and help an officer make a decision whether probable cause exists to justify an arrest and further investigation. Screening tests generally require a trait called sensitivity. A highly sensitive test recognizes traits of the suspected characteristic (impairment in the case of a DUI investigation) with a high degree of success. For example, in the 1998 study, 92% of all drivers who were over .08 failed the One Leg Stand test. This is a 92% sensitivity- a high rate of what we call true positive results, and a desired accuracy rate for a screening test.
The danger of a test with high sensitivity though, is that traits that belong to characteristics other than the suspected characteristic are often detected. For example, virtually all impaired people will have balance problems, therefore a balance test will commonly reveal impairment- however a tired person, an older person, or an injured person may also have imperfect balance when completely sober. Balance, as a trait, is not specific to impairment. Therefore, poor balance can be a valid factor in determining probable cause for arrest, but we don't want to wrongly convict a person of DUI for simply having imperfect balance. In the same 1998 data, 41% of the drivers who were UNDER .08 failed the One Leg Stand. This is a 41% false positive rate, and is unacceptably wrong for a forensic test for use in court to prove a person's guilt.
The above conflict leads directly into the definition of a confirmatory test. A confirmatory test requires a high specificity, and is carefully designed to make sure that any possible innocent explanations for a trait that was part of a probable cause determination are eliminated before evidence can be used in court to convict a person of a crime. In some countries for example, DUI suspects brought in by police and are evaluated by a forensically trained physician who uses a number of scientific tests to assess impairment. Interestingly, one published report over 20 years ago indicated a majority of the most highly forensically trained doctors disapproved of using the Walk & Turn and One Leg Stand to forensically confirm impairment.
The Problem is in the Training Materials
In officers' defense, most police officers don't know they are quoting misleading numbers. Most DUI officers do not understand that they basing their DUI arrest opinions on fake science. The lie started long before the officers went to the police academy. Most officers don't have the education or scientific background to answer questions about research and statistics for the truth to be revealed in traditional cross examination by DUI lawyers. This makes police officers quoting what they think is science in court very dangerous and difficult to combat.
Even the best DUI defense lawyers were slow to catch on to the shell game and the misleading presentation of information to police officers in the written materials they are given in their DUI training. This has allowed Colorado DUI case law to grow over the years to allow police to testify as if they are scientifically trained experts, even though their "expertise" is based on many lies and fake science.