Categories: The Law

Due Process and Probabilistic Genotyping Software

Updated in September 2021.

WARNING: This is a LONG article. I suggest you read the first 3 paragraphs and the last, and lie and tell me you read it in its entirety.

The Rise of Algorithms and the Decline of Constitutional Protections

There are innumerable benefits to algorithms but, as discussed elsewhere on this site, they can also adversely impact lives without remedy. If we blindly follow algorithmic recommendations without challenge or examination, we may end up all wet.

The proliferation of algorithms adopted by law enforcement is deeply concerning, particularly when one’s Constitutional rights are at risk. One instance where technology erodes Constitutional protections is with the use of probabilistic genotyping software in criminal proceedings. It is often difficult for forensic scientists to determine a DNA match when a sample contains multiple DNA sources. As a result, DNA analysis is occasionally outsourced to private companies like Cybergentics, which manufactures a probabilistic genotyping software called TrueAllele. This software is a statistical algorithm that determines the probability that specific DNA is present in a DNA mixture. The company claims to provide an accurate result while removing human bias and error.[1]  The Cybergentic’s website states, “[t]he truth-seeking computer gives reliable DNA match results, based on solid science, not subjective opinion.”[2]  The use of this software; however, undermines admissibility standards and challenges defendants’ due process rights by denying them access to the source code and an ability to challenge its results.

Greg Hampikian, founder and director of the Idaho Innocence Project, stated that he is not concerned about the content of the TrueAllele source code, “the proof is in the pudding – not the recipe.”[3]  The pudding may look good, but we have all seen the Help, and for peace of mind, we need to know what is in “the pudding.” Source code must be shared with defense counsel under a protective order to protect Fifth Amendment rights.

DNA Analysis/ Methodology: The Science Portion of the Article

Feel free to skip if you hate science, learning, candy and fun!

Every cell in the human body contains deoxyribonucleic acid (DNA). The genome is the complete set of DNA found in an organism.[4]  Most DNA is composed of two paired strands called the double helix, which twist at a common axis point. The strands are composed of four chemicals that are bound together by hydrogen to create nucleotide bases. The bases are adenine (A), thymine (T), cytosine (C), and guanine (G).  Bases on opposite strands are paired together: A binds with T, and C binds with G. The order of the bases along the molecule creates one’s genetic code. Approximately 3 billion base pairs are in the human genome.[5] Ninety-nine point nine percent of DNA from two people are identical.  The .01% of DNA sequences that differ are called genetic markers and are the same in identical twins and similar in related individuals.[6]  

When DNA analysis occurs, DNA is chemically extracted from a sample– such as blood, semen, hair, or skin cells– and approximately 100 to 500 base letters are analyzed.[7]  A predetermined set of DNA segments, a locus, is amplified using the Polymerase Chain Reaction (PCR) and millions of copies of the DNA are made to assist with analysis.  Humans carry two variants at each locus called alleles, one maternal and one paternal, that vary in length.[8] If the list of alleles is the same, then two DNA samples match.  A known sample is compared to an evidentiary sample to determine if they belong to the same person.  After DNA is matched, the random match probability is calculated using well-established doctrines regarding population genetics and statistical analysis. This test provides an estimate of the probability that matched DNA would belong to two different sources.[9] 

Errors do occur in DNA testing, but the probability that two samples from different sources match is quite low.  Most mistakes are the result of human error, such as a sample mix-up or contamination.[10]  To reduce the chance of an error, the FBI requires laboratories to follow the FBI’s Quality Assurance Standards (QAS).[11]

A mixture of two or more DNA contributors is difficult to evaluate, especially if the DNA is available in small amounts.  Interpreting a DNA mixture is different than a single sample for several reasons: alleles may overlap, there could be differences in the amount of DNA present, alleles may be obscured, or there could be a suggested presence of alleles that are not there.  “It is often impossible to tell with certainty which alleles are present in the mixture, or how many separate individuals contributed to the mixture, let alone accurately infer the DNA profile of each individual.”[12]  This process requires subjective analysis, whereas DNA provided by one sample is largely considered objectively analyzed.  Probabilistic genotyping seeks to remove the subjective consideration that occurs when a DNA mixture is present and claims the use of algorithms provides objective evaluation.  

The Use of Probabilistic Genotyping Software

In August 2014, police in Syracuse, New York attempted a traffic stop for a car that did not have on headlights.  The driver fled the scene and the officers pursued on foot. The officers heard gunshots and recovered a loaded handgun; however, they did not catch the suspect. Police determined the abandoned car belonged to Frank Thomas, though his DNA could not be matched to DNA on the gun due to the presence of a DNA mixture.  The forensic analysis was outsourced to Cybergenetics where their algorithm determined there was a “1.78 trillion times more probable than coincidental match” between the DNA found on the gun and the DNA of Thomas.[13]  This was the only physical evidence that connected Thomas to the gun.  Thomas was found guilty and sentenced to 15½ years in prison. He is currently appealing the conviction.[14]

Another case involves Michael Robinson who was accused of murdering two men. The only evidence placing him at the scene of the crime was a bandana with a mixture of DNA that was collected the day after the crime. TrueAllelle engineer Dr. Mark Perlin testified that it was 5.7 billion times more probable than coincidence that the DNA on the bandanna matched Mr. Robinson as opposed to an unrelated African-American man.  Jurors acquitted Mr. Robinson and one unnamed juror stated that the DNA evidence presented by the prosecution was troubling because the forensics lab could not analyze the DNA even though Dr. Perlin said he could.[15]

A TrueAllele competitor called STRmix is used by the U.S. Army, the FBI, and New York City’s Office of the Chief Medical Examiner. In 2014, STRmix helped solve a robbery where one shoe fell from the defendant as he fled from police gunfire.  STRmix determined that the defendant was a “major donor” of the DNA in the shoe with a 1 in 100 billion odds of it belonging to someone else.  This was the first time that STRmix was used in the United States for a conviction.[16]  The judge held that Dr. John Buckleton, a STRmix software developer that testified at trial, was a recognized expert in the field of DNA analysis. Buckleton swore that these tests comported with FBI Quality Assurance Standards by a properly accredited laboratory.[17] 

Despite judicial decisions to include evidence provided by probabilistic genotyping software, the technology is not infallible.

Prior Errors in Software and Methodology

Occasionally, forensic methodology that was previously admissible is later considered “junk science.”[18]  In 2012, the Department of Justice and FBI reviewed testimony from over 3,000 criminal cases that involved microscopic hair analysis. They determined that testimony used to implicate defendants was scientifically invalid in over 95 percent of the cases.[19]  Coding errors have also affected forensic tools.  In Minnesota, a bug in the Breathalyzer software resulted in the exclusion of evidence at trial.[20]

In 2015, STRmix contained a coding error that was used in 60 criminal cases before it was discovered.[21]  The company responded by stating, “as with any software product, we do not claim that the code is error free… if there are any remaining errors they are very small and in parts of the code that activate rarely.”[22]  Despite assurance that any mistakes left in the code are small, minor errors render software inoperable and results unusable.  In one instance in 2016, TrueAllele and STRMix analyzed the same DNA and came to different conclusions.  It is unknown if the inconsistency is the result of an error in one or both of the algorithms; however, the judge did not admit testimony from either company.[23]    

It is the view of Dr. Buckleton that the statements, “that for 60 criminal cases before it was discovered,” “TrueAllele and STRMix analyzed the same DNA” and “came to different conclusions” in the preceding paragraph are false. A list of miscodes is available on their website.

In a case in Northern Ireland– Regina v. Colin F Duffy & Brian P Shivers— TrueAllele was run four times and produced four different likelihood ratios: 389 million, 1.9 billion, 6.03 billion, and 17.8 billion. Perlin chose to present the 6.03 billion result. Since this case, Cybergentics stated that it is open to informing juries about multiple results, but there are no standards to govern which results will be presented or an explanation for the variations.[24

Due to possible errors in forensic analysis– and the heightened need for accuracy in criminal investigations– it is important to determine if testimony based on probabilistic genotyping should remain admissible. 


Standards for the Admissibility of Evidence: The Legal Portion of the Article

Feel free to skip if you hate case law, the Federal Rules of Evidence, entertainment, and adventure!  

Courts generally balance several factors when determining the reliability of scientific evidence: the accuracy of the result; the process that led to the result; and what the relevant scientific community believes about the process. Initially, the “gatekeeper” of scientific evidence was the opinion of the relevant scientific community, pursuant to the standard found in Frye v. United States. “While the courts will go a long way in admitting experimental testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.” [25]  

In the majority of states, the judge serves as the “gatekeeper” pursuant to the Daubert standard.[26] In Daubert v. Merrell Dow Pharmaceuticals, the Supreme Court of the United States stated, “the trial judge must determine…whether the reasoning or methodology underlying the testimony is scientifically valid.”[27] The Court held that a judge should consider at least the following five factors: (1) if the theory or technique has been, or could be tested; (2) if the theory or technique has been subject to peer review and publication; (3) what is the potential rate of error of the technique; (4) what are the standards controlling the technique’s operation; and (5) what is the degree of acceptance found in the relevant scientific community.[28]  The Court also asked judges to be mindful of Rule 403 of the Federal Rules of Evidence which permits the exclusion of evidence that might mislead or confuse the jury, especially because expert testimony may be difficult to evaluate.[29]

Courts also consider Rule 702 in regards to expert testimony. The rule asks if (1) the expert’s knowledge will help the trier of fact understand the evidence or determine a fact in dispute; (2) the testimony is based on sufficient information; (3) the testimony is the result of reliable methods; and (4) the expert has reliably applied the principles and methods to the facts of the case.[30]

The Admissibility of Testimony in Cases that Used TrueAllele

TrueAllele was first used in the United States in Pennsylvania in Commonwealth v. Foley. Pennsylvania uses the Frye standard for admissibility and the Court split the Frye test into two steps: (1) the party opposing the evidence must show that it is not “novel” by demonstrating a “legitimate dispute” regarding expert testimony; and (2) if the moving party meets this standard, then the non-moving party must demonstrate that the methodology is generally accepted in the relevant scientific community.[31]  The Court did not find that TrueAllele was “novel” because it is a variation of a prior methodology used to calculate probabilities called the product rule.  Since statements regarding the product rule have been admitted in Pennsylvania, testimony given by Dr. Perlin was admissible. Legitimate dispute of expert testimony does not solely rely on the newness of the technology, and novelty alone will not dissuade the court from entering the testimony.[32]

In 2014, TrueAllele was used in Ohio v. Shaw.  Dr. Perlin testified on behalf of the state and the Court allowed his testimony to be entered.[33]  The defendant’s attorneys argued that the testimony was inadmissible because TrueAllele had not been subject to comprehensive peer-review and its software was not validated by independent researchers with access to the source code.  Dr. Perlin testified that five published peer-reviewed articles proved the validity of the software, which “go beyond an internal validation.” Three of the articles listed him as one of the authors.[34]

Testimony was also provided by Dr. Ranajit Chakraborty who is a faculty member for the Scientific Working Group on DNA Analysis Methods, which sets the regulations for laboratories throughout the country.  He stated that TrueAllele worked well when high quantities of DNA from a single source were present, but cautioned against relying on the software where low quantities of mixed DNA exist. He further stated that TrueAllele had not been generally accepted in the scientific community and needed to be more thoroughly peer-reviewed.  Dr. Charkraborty believed the source code should be studied by an independent party.[35]  The trial court in this case used the Daubert factors and concluded that Dr. Perlin’s testimony was admissible.  Given that TrueAllele is used in other jurisdictions and in three laboratories, the court concluded that it met the general acceptance standard.[36]

Testimony Should Be Inadmissible

The courts erred in their decision to admit testimony based on TrueAllele’s algorithm. 

The scientific validity of the technology needed under Frye has not been met, and the algorithm was not tested enough to satisfy any of the five considerations in Daubert.  In September 2016, the President’s Council of Advisors on Science and Technology (PCAST) wrote a report that concluded that probabilistic genotyping needed further scientific scrutiny.  Most of the research on TrueAllele and STRMix were conducted by the tools’ developers and not independent researchers. “While it is completely appropriate for method developers to evaluate their own methods, to establish scientific validity also requires scientific evaluation by other scientific groups that did not develop the methods.”[37] PCAST asserts that studies conducted on STRMix and TrueAllele appear reliable within a certain range, when certain evidence is available– including when DNA mixtures from three sources or fewer are present– and when the minor source of DNA present is at least 20% of the mixture.[38] Due to the lack of comprehensive peer review and testing, these algorithms fail to meet the Frye and Daubart standards.  

The testimony should also be inadmissible because in Duffy and Shaw experts did not know the exact number of samples of DNA that were present. The system operators guessed the number of DNA sources, and this imprecise entry may affect how the software operates.  Since there is no transparency into system processes, there is no understanding of how manual data entry– and the potential for human error– affects system performance.[39]  

Lastly, the TrueAllele testimony should not be admitted because the company’s business model incentivizes matches. The company offers access to the software and preliminary results for free.  If the results determine a likely statistical match, the customer can decide to have the software run a complete analysis, and the company provides a report that can be used in trial for a fee.[40] Only if a statistical match is likely and the customer chooses to purchase a full report does the company receive payment.  Since there is no access to the source code, the defense has little reason to believe that the company does not manufacture matches solely to generate income. 

Additional Peer Review and Independent Testing Is Necessary

The court should prohibit inclusion of probabilistic genotyping software until further testing addresses 4 issues found in the PCAST report: (1) how well the program performs based on the number of DNA contributions present in the mixture, including when the number of contributors is unknown; (2) the accuracy of the software when a number of alleles are shared by individuals, including if they are related; (3) how well the software performs based on the amount of DNA present by various contributors; (4) what circumstances cause it to produce results different than those produced by other methods? [41]  Furthermore, there should be more comparative studies conducted between the methods used by different companies and by independent groups.

The courts generally cite trade secret laws to protect developers from releasing code in criminal investigations. Should these laws apply to probabilistic genotyping? 

Intellectual Property: Trade Secrets

Two thirds of states have codified in their rules of evidence that a trade secret is evidentiary privileged.[42]  When the court considers the privilege, it decides if the trade secret is valid and if its disclosure would cause harm. It also determines if the information is relevant and necessary to the case. Finally, the court analyzes whether the risk of harm of disclosure is greater than the need for information.[43] “Recognizing a trade secrets privilege in criminal proceedings suggests that private actors have a right to own the very means by which the government decides criminal justice outcomes, and implies that the adversarial process is itself a business competition.”[44]

The manufacturers of probabilistic genotyping argue that their methodology is sufficiently peer reviewed and accepted in the relevant scientific community while also maintaining the confidentiality of their code.  Some technologists and scientists argue that if a source is necessary to satisfy peer review, “[a]nything less than the release of source code programs is intolerable for results that depend on computation.”[45]  Dr. Perlin; however, believes that if TrueAllele’s source code is revealed, it will hurt his business prospects.[46] New York University law professor Erin Murphy stated “even if the [TrueAllele] software is operated by a trained expert, the source code still should be open and proven able to withstand challenge from disinterested and experienced reviewers.”[47]

Courts generally agree with Dr. Perlin and cite business interests as justification to protect source code. When Foley sought to appeal conviction because he was unable to understand and examine the software, the judge denied his request and stated, “TrueAllele is proprietary software; it would not be possible to market TrueAllele if it were available for free.”[48]  The court found that the business interest of the developer outweighed the harm caused by a failure to disclose. 

Trade Secret Law Should Not Apply

The judges erred in siding with Dr. Perlin in the application of trade secret law.  Since the source code is inaccessible, the defendants were unable to challenge the validity of the science or the accuracy of its results in violation of the Due Process Clause. This clause protects defendants “against conviction except upon proof beyond a reasonable doubt of every fact necessary to constitute the crime with which [the defendant] is charged.”[49] The defense attorneys in Robinson stated, “[t]he Petitioner cannot cross examine a computer. Without production and defense review of the computer instructions, not only will the Petitioner be denied his constitutional right to a fair trial- he risks being wrongly executed.”[50] In criminal procedures, the court should not put the business interest of a company above the constitutional rights of the defendant.  The defendant must be given an opportunity to challenge the code, especially since one’s life and liberty are on the line and algorithms are not infallible. 

Source Code Should Be Provided to the Defense

A copy of the source code should be given to defense attorneys so it can be reviewed by their experts.  Since the source code contains 170,000 lines, it will never be entered into public record.  It cannot be reverse engineered by a competitor based on a layman’s explanation provided to the judge and jurors. [51]

Edit 08/2020: Great news! STRMix provides defense teams access to source code in some case.

BS Conclusion: Some Judges Make Erroneous Decisions Because They Have Not Read BS

In August 2019, the New York Appellate Court affirmed the conviction of John Wakefied in a murder case where the TrueAllele algorithm was found to pass the Frye test. Additionally, the court found that the defendant was not entitled to the source code and that there was no violation of the Sixth Amendment Confrontation Clause because Wakefield, “had the opportunity to confront his true accuser.”[52] In 2021; however, judges in Virginia and Pennsylvania supported defense request to see the TrueAllele source code. It is too soon to determine if the cases in Virginia and Pennsylvania signal a larger shift in the court’s posture towards probabilistic genotyping software. It is; however, critical that the criminal justice system not be the venue to test drive new software without examination. If we assume the algorithm is always right, we may take a wrong turn and sink due process guarantees.

BS


[1]Lauren Kirchner, Where Traditional DNA Testing Fails, Algorithms Take Over, ProPublica, November 4, 2016, https://www.propublica.org/article/where-traditional-dna-testing-fails-algorithms-take-over.

[2]Cybergenetics, https://www.cybgen.com/solutions/law.shtml.

[3]Noel Erinjeri, Mark Perlin, the Man Inside the Black Box, Mimesis Law, November 30, 2015, http://mimesislaw.com/fault-lines/pay-no-attention-to-the-man-inside-the-black-box/5037.

[4]The Human Genome Project, https://www.genome.gov/11006943/human-genome-project-completion-frequently-asked-questions/.

[5]Harvey Lodish, Arnold Berk, S Lawrence Zipursky, Paul Matsudaria, David Baltimore and James Darnell, Molecular Cell Biology, Fourth Edition, 2000, http://www.ncbi.nlm.nih.gov/books/NBK21475/.

[6]How Does DNA Testing Work, BBC Science, Updated February 1, 2013, http://www.bbc.co.uk/science/0/20205874.

[7]Lodish, et al. 

[8]The President’s Council of Advisors on Science and Technology (PCAST) 2016 Report, at 69, https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/
PCAST/pcast_forensic_science_report_final.pdf

[9]PCAST at 73. 

[10]Id.  

[11]FBI, Quality Assurance Standards for Forensic DNA Testing Laboratories, (2011), www.fbi.gov/about- us/lab/biometric-analysis/codis/qas-standards-for-forensic-dna-testing-laboratories-effective-9-1-2011.

[12]PCAST at 76. 

[13]Kirchner. 

[14]Id.  

[15]Paula Reed Ward and Torsten Ove, Jury Acquits Duquesne Man In Double Homicide Case, Pittsburgh Post-Gazette,February 7, 2017, http://www.post-gazette.com/local/east/2017/02/07/Jury-gets-Duquesne-double-homicide-case/stories/201702070160.

[16]John S. Hausman, Lost Shoe Led to Landmark DNA Ruling- and Now, Nation’s 1stGuilty Verdict, Michigan Live, March 18, 2016, http://www.mlive.com/news/muskegon/index.ssf/2016/03/lost_shoe_led_to_landmark_dna.html.

[17]The People of the State of Michigan v. Elamin Muhammad (2015), http://media.mlive.com/chronicle/news_impact/other/muhammad-opinion.pdf.

[18]Petition for Review Filed by Defendant Michael Robinson, February, 4, 2016, https://www.cybgen.com/information/newsroom/2016/apr/files/Petition_for_Review
_of_Feb_4_2016_Order.pdf.

[19]PCAST at 3. 

[20]Logan Koepke, Should Secret Code Help Convict?, Medium, March 24, 2016, https://medium.com/equal-future/should-secret-code-help-convict-7c864baffe15.

[21]Id. and https://www.couriermail.com.au/news/queensland/queensland-authorities-confirm-miscode-affects-dna-evidence-in-criminal-cases/news-story/833c580d3f1c59039efd1a2ef55af92b?=

[22]Statement Relating to STRmixTM Miscodes, Friday, 18 March 2016, http://strmix.esr.cri.nz/assets/Uploads/Statement-relating-to-STRmix-miscodes-180316.pdf

[23] Lauren Kirchner, Where Traditional DNA Testing Fails, Algorithms Take Over, November 4, 2016, https://www.propublica.org/article/where-traditional-dna-testing-fails-algorithms-take-over

Report to the President Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods, Executive Office of the President President’s Council of Advisors on Science and Technology, September 2016, footnote 212, https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/PCAST/
pcast_forensic_science_report_final.pdf

[24]Katherine L. Moss, The Admissibility of TrueAllele: A Computerized DNA Interpretation System, 72, Washington and Lee Law Review, 1038, Spring (2015), http://scholarlycommons.law.wlu.edu/cgi/viewcontent.cgi?article=4457&context=wlulr.

[25]Frye v. United State, 293 F. 1013 (D.C. Cir. 1923). 

[26]Moss at 1072. 

[27]  Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993). 

[28]Id

[29]PCAST at 41. 

[30]Federal Rules of Evidence, Article VII, Rule 702, Testimony by Expert Witnesses, https://www.law.cornell.edu/rules/fre/rule_702.

[31]Commonwealth v. Foley, 38 A.3d at 890 (2009). 

[32]Id

[33]Order, Ohio v. Shaw, CR-13-575691 (2014), http://www.cybgen.com/information/press-release/2014/TrueAllele-CaseworkRuled-Admissible-in-Ohio-Daubert-Challenge/admissibility.pdf.

[34]Moss at 1067-1070.

[35]Moss at 1069 – 1070. 

[36]Moss at 1070. 

[37]PCAST at 80. 

[38]PCAST at 81. 

[39]Moss at 1072. 

[40]Kirchner.

[41]PCAST at 79-80. 

[42]Rebecca Wexler, Life, Liberty, and Trade Secrets: Intellectual Property in the Criminal Justice System, Data & Society Research Institute, February 20, 2017, 19, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2920883.

[43]Wexler at 20. 

[44]Wexler at 61. 

[45]Darrel C. Ince, et. al., the Case for Open Computer Programs, 482 Nature 485 (2012). 

[46]Koepke.  

[47]Erin E. Murphy, Inside the Cell: The Dark Side of Forensic DNA, Nation Books, October 6, 2015. 

[48]Kirchner. 

[49]In re Winship, 397 U.S. 358, 363-364 (1970). 

[50]Petition for Review Filed by Defendant Michael Robinson.  

[51]Koepke. 

[52]https://www.cybgen.com/information/admissibility/Wakefield2019.pdf

Kristin

Recent Posts

The Tin Man’s Journey to a Heart

Do you suspect you are without a heart? Watch the 2006 Jane Eyre miniseries and…

2 years ago

Save Time and Read This Book Instead of Appearing on the Bachelor(ette)

You will not receive more Instagram followers, a podcast, sponsored trips, or passes to b-level…

2 years ago

BS Chats with Kamala Harris

Is VP Harris scared of tough questions? I don't know, her people have not gotten…

3 years ago

BS Reviews: Phantom Planet

To celebrate the 17 year and 11 month anniversary of the Phantom Planet LP, I…

3 years ago

Captain America: Civil War is Nonsense- An Introduction to the Laws of Armed Conflict

Captain America: Civil War has a 90% fresh rating on Rotten Tomatoes despite the outrageous…

3 years ago

BS Holiday Gift Guide

The expression “it is better to give than to receive” is often true because people…

3 years ago