One personโs DNA became the centerpiece of a genetic sequence used by biologists the world over. Did he agree to that?
They numberedย 20 in all โ 10 men and 10 women who came to a sprawling medical campus in downtown Buffalo, New York, to volunteer for what aย news reportย had billed as โthe worldโs biggest science project.โ
It was the spring of 1997, and the Human Genome Project, an ambitious attempt to read and map a human genetic code in its entirety, was building momentum. The projectโs scientists had refined techniques to read out the chemical sequences โ the series of As, Cs, Ts, and Gs โ that encode theย building blocks of life. Now, the researchers just needed suitable human DNA to work with. More exactly, they needed DNA from ordinary people willing to have their genetic information published for the world to see. The volunteers who showed up at Buffaloโs Roswell Park Cancer Institute had come to answer the call.
To take part in the study was to assume risks that were hard to calculate or predict. If the volunteers were publicly outed, project scientists told them, they might be contacted by the media or by critics of genetic research โ of whom there were many. If the published sequences revealed a worrisome genetic condition that could be tied back to the volunteers, they might face discrimination from potential employers or insurers. And it was impossible to know how future scientists might use or abuse genetic information. No oneโs genome had ever been sequenced before.
But the volunteers were also informed that measures had been put in place to protect them: They would remain anonymous, and to minimize the chances that any one of them could be identified based on their unique genetic sequence, the published genome would be a patchwork, derived not from one person but stitched together from the DNA of a large number of volunteers. โIf we use the blood you donateโ to prepare DNA samples, the consent form read, โwe expect that no more than 10% of the eventual DNA sequence will have been obtained from your DNA.โ

Soon, however, those assurances began to wither. When a much-celebrated working draft of the human genome was published in 2001, the vast majority of it โ nearly 75 percent โ came from just one Roswell Park volunteer, an anonymous male donor known as RP11.
To this day, the story of how and why RP11 came to be the centerpiece of one of biologyโs crowning achievements has largely escaped public scrutiny. Even the scientists who helped orchestrate it disagree about the particulars.
To piece the story together, Undark reviewed more than 100 emails, letters, and other digital documents housed within the History of Genomics Archive at the National Human Genome Research Institute. The documents, provided to Undark through an institutional research collaboration agreement, reveal that the projectโs sourcing of human genetic material was more ethically fraught than official publications portrayed it to be, and included DNA harvested from a cadaver, and from one of the projectโs own scientists. The records, along with interviews with many of the projectโs central figures and with experts in law and bioethics, paint a picture in which high-ranking project officials โ constrained by their own experimental protocols and accelerated timelines โ veered from their guiding principles and pushed the boundaries of informed consent.
โWe were panicking,โ recalled Aristides Patrinos, who led the Department of Energyโs efforts in the Human Genome Project and, along with National Human Genome Research Institute director Francis Collins, helped steer the project to completion. โSo a lot of these issues were not front and center. Thatโs no excuse, but it was a reason. We were under a lot of pressure to make sure we finished by the time we finished.โ
The revelations potentially cast a stain on a project that had been extolled for its high ethical standards. โItโs a big deal when researchers act deceptively, which is to say they do things that they said they werenโt going to do, or donโt do things that they said they were,โย said Paul Appelbaum, a Columbia University professor who specializes in legal and ethical issues in medicine, psychiatry, and genetics.ย โIt has the potential to negatively impact the research enterprise in general, and the benefits that can potentially come from it.โ

To the extent that an injustice was done, it has propagated far and wide. The genetic sequence that emerged from the Human Genome Projectย continues to serve as a cornerstone resource of modern biology โ as a so-called reference genome, used ubiquitously by clinicians and researchers to identify genetic variants, sequence new genomes, and aid tests that determine patientsโ genetic risks. Although the reference genome has undergone several refinements and incorporated new genetic material over the years, RP11 remains at the center of it all, with his DNA still constitutingย more than 70 percentย of the most recent versions.
RP11 is likely unaware that his DNA played, and continues to play, such a pivotal role in the march of genetic science. Project leaders, hamstrung, they say, by a decades-old ethics panel decision, have never attempted to inform him.
โWell, I think at this point, it probably would be a good idea to come out in the open and tell everybody what happened,โ said Patrinos. โAnd give as many specifics as possible.โ

The Humanย Genome Project is often compared to the achievement of putting humans on the moon. Launched in 1990 by the Department of Energy and the National Institutes of Health, the project took 13 years and, at the time, around $3 billion to complete. By 2000, scientists had sequenced around 85 percent of the genome, and the milestone was marked with a White House ceremony. President Bill Clinton described it as โmore than just an epic-making triumph of science and reason.โ U.K. Prime Minister Tony Blair, who joined by satellite, called it the kind of breakthrough that โtakes humankind across a frontier and into a new era.โ
But in 1996, the project was at a crossroads. Francis Collins, then the director of NIHโs National Center for Human Genome Research โ later renamed the National Human Genome Research Institute, or NHGRI โ was leading the international consortium of laboratories tasked with completing the sequence. Still in his mid-40s, the physicianโs star was rising. He had succeeded Nobel laureate James Watson years earlier as the centerโs director, and Barack Obama would later appoint him to the helm of NIH, the worldโs largest public funder of biomedical and behavioral research. People who worked with him described him as a brilliant mind and a great communicator โ a passionate leader with legendary powers of persuasion.
Collins needed all of those qualities to manage the first sequencing of a human genome. It was a staggeringly complex operation. First, the entirety of a personโs DNA โ a molecular sequence of more than 3 billion pairs of nucleotide bases, typically represented as As, Cs, Ts, and Gs โ had to be broken into fragments roughly 100,000 to 200,000 base pairs long. The fragments were then isolated and cloned, typically by specially preparing each one and inserting it into a bacterium, which copied the fragment as it reproduced. In this way, the teamโs scientists could make a physical copy of a personโs full, albeit fragmented, genome โ known as a clone library. Identical clone libraries could then be shipped to different laboratories around the world, allowing many research groups to read the fragments, and piece the sequences back together, in parallel. In a way, it was like distributing sets of the same, extraordinarily difficult jigsaw puzzle to a lineup of the worldโs best puzzle solvers: They could work on different sections of the puzzle simultaneously and, if need be, check each otherโs work.
By 1996, clone libraries were already being distributed to a variety of labs. But that spring, project members learned that several of the libraries had been constructed without any informed consent process and with no oversight fromย institutional review boards, or IRBs โ bodies that, according to federal policy, should have ethical purview over research with human subjects. Rumors swirled that some of the DNA had come from scientists involved with the project, a scenario that project members speculated could raise ethical questions about consent and invite charges of elitism. Internal project correspondence and tissue bank donation records reviewed by Undark suggest that another DNA source was the cadaver of a 19-year-old who had died by suicide; the family had donated the body to science but had not specifically consented to its use in the Human Genome Project.

It bothered Collins that at least one donorโs identity was known to project scientists, and that the donor was aware his DNA was being used to create a library. โIt sounds as if the donor knows who he is,โ he wrote in an email that March, after being briefed on a clone library that had been constructed at the California Institute of Technology. โThatโs not the way it should have been done.โ
In the wake of the revelation, Collins and Patrinos consulted an array of advisers and came up with a new plan, outlined in aย joint guidance. They would find new donors and make new clone libraries, under new protocols. Unlike the old libraries, the new ones would be obtained through a double-blind procedure: Scientists involved with the project would not know the identities of the donors, and donors wouldnโt know for certain whether their DNA was being used in the project. According to internal correspondence and interviews, project leadership was concerned not only about the genetic privacy of the donors, but also about the possibility that a donor might trumpet their role to the media and create a spectacle.
โIt seemed like it would create a major distraction from what we wanted to generate,โ recalled Robert Waterston, who headed one of the five centers that did the majority of the sequencing for the project.
โWe wanted the human genome,โ he added โ meaning a reference that everyone could relate to. โItโs not Joe Blowโs genome. Itโs your genome. Itโs my genome. Itโs representative of everybodyโs genome.โ

To further protect the two-way confidentiality, the completed representation of the human genome would be a mosaic, assembled from the DNA of not one but multiple donors. The thinking, among the projectโs inner circle, was that a mosaic would not only complicate attempts to identify donors based on the genetic sequence but also reduce the incentive for wanting to know the donorsโ identities to begin with. If a donorโs identity did come to light, limiting their contributions might minimize their exposure to potential harms โ and deter them from attempting to claim property or ownership rights over the published sequence.
In a June 1996 email that appears to be written by Melvin Simon, who led a cloning operation at Caltech, the scientist told Human Genome Project leadership, including Patrinos, that, as he understood it, no matter what waiver a volunteer is willing to sign, he or she would not lose ownership or property rights. โThus only by a true patchwork or anonymizing approach can it be made extremely difficult to claim such rights,โ the email read. (Simon confirmed the sentiment behind the email in an interview with Undark.)
Simonโs Caltech team and a laboratory at the Roswell Park Cancer Institute were each commissioned to create new clone libraries under the new protocols. Soon, however, the plans for a mosaic genome would veer off course, and the Human Genome Project would find itself in a consent conundrum โ with one person, RP11, caught in the middle.
Pieter de Jong,ย who led the cloning project at the Roswell Park Cancer Institute, had been behind some of the same problematic libraries that had sparked Collinsโ consternation in the spring of 1996. But he had a long history with the project, and he was a foremost expert at DNA cloning. So when the Human Genome Project enacted its new plan, they commissioned him to build at least five new libraries, de Jong recalled to Undark.

This time, de Jong used a lottery-like process to select donors. On March 23, 1997, he ran an advertisement in the Buffalo News seeking 20 volunteers. The edition also featured a front-page story about the project, which de Jong said he helped arrange. In the weeks that followed, the volunteers each came in, met with a genetic counselor, signed a consent form, and donated a few tablespoons of blood.
The genetic counselor labeled each blood sample with a number, but created no records linking the samples to their donors. The 20 samples were then transferred to de Jong, who chose two at random โ one male and one female โ to use for clone libraries. The only personal information the facility retained were the names and signatures on the consent forms, which were sealed in envelopes and stored in a locked file cabinet. As a result, it would be virtually impossible for anyone at Roswell Park to determine who the two donors were.
A postdoctoral researcher, Kazutoyo Osoegawa, did most of the work building the first library. Osoegawa was skillful, de Jong recalled, with a knack for coaxing large fragments of DNA from a sample for cloning: The larger the fragments, the more easily scientists could map them for sequencing, and the fewer fragments overall they would have to sequence to finish the job.
By August of 1997, de Jong, Osoegawa, and their colleagues had begun distributing the first of the new Roswell Park clone libraries, RP11, and it was a good one โ with enough fragments for scientists to be fairly certain that they spanned essentially the entire genome, with few missing gaps. A second library was in the works, with more to follow. But, before those libraries could materialize, the Human Genome Projectโs plans took a turn.
On the evening of Sept. 20, 1998, Francis Collins emailed NHGRI brass, including Jane Peterson, a program director involved with the sequencing effort, and Mark Guyer, the instituteโs assistant director for scientific coordination, about an unhappy circumstance. โI have been feeling uneasy about the RPC11 library ever since Jane uncovered the language that Pieter de Jong used for the consent form,โ he wrote. (The RP11 library was often referred to as RPC11 or RPCI-11 in correspondence.)
The specific language that unsettled Collins was the passage conveying that no more than 10 percent of the genetic sequence was expected to come from their DNA. And it was resurfacing at an inopportune moment.
The Human Genome Project was in the midst of what Maynard Olson, who led one of the projectโs sequencing labs, described in an email that September as a โde facto drift away from the concept of a genome sequence that is a mosaic of contributions from many individuals.โ When de Jong crafted the consent language, he was under the impression that 10 new clone libraries would be built and integrated into the completed genome. But now project leaders were lurching toward a strategy that would draw most of the final sequence โ between 60 and 90 percent โ from a single clone library. And RP11 was their library of choice.

In his email to his NHGRI colleagues, Collins wrote that the document of general principles he and Patrinos had shared suggested an intent to include several donors but wasnโt specific about it, โnor does it put a ceiling on the amount of sequence that could come from a single person.โ
The 10 percent language in the consent form worried him, however. Attempting to reconsent RP11 under new terms would be complicated: RP11 could have been any of the 10 male donors, and all the researchers had to go on were the names on the consent forms. The only way he could think to do it, he wrote, would require asking every volunteer if they objected to the raising of the 10 percent restriction โ โand then holding our breath that none of them do.โ
Technically, the word โexpectโ didnโt forbid using RP11 for more than 10 percent of the sequence, Collins wrote, โbut how far can we push this?โ
The next month, Collins joined a conference call with de Jong, Roswell Park IRB chair Harold Douglass, and other Roswell Park and NHGRI staff. According to handwritten notes, Collins told them that limiting use of the clone library to 10 percent would devastate the momentum of the project and that there were concerns about recontacting all 10 male donors. The notes indicate that Douglass mentioned the IRB would ask about the benefit of fast tracking, and Collins said there was a medical reason: to โfind as many genes ASAP to understand disease.โ (Speaking to Undark, Collins confirmed his participation in the call. He said the notes, taken by a different participant, used phrasing he wouldnโt have used, but seemed correct.)
Days later, the Roswell Park IRB met and โ according to a written summary that was shared with Guyer โ โvoted unanimously against any attempts to try to find and reconsent the ten donors.โ Among the IRBโs stated justifications were that the expectation expressed to the donors was not a guarantee, and that attempting to reconsent the 10 male volunteers would be difficult and could jeopardize RP11โs anonymity. To delay the project byย not expanding the use of RP11โs library, the panel added, would itself be unethical, given the number of people who stood to derive health benefits from the timely completion of the human genome. (Douglass declined to comment for this story.)
Recently,ย Collins spoke to Undark about RP11 and the Human Genome Projectโs donor sourcing strategies. He was joined by Eric Green, who was also involved with the project and currently leads the National Human Genome Research Institute.
According to Collins and Green, project leaders did initially aim to construct 10 new clone libraries for use in the completed genome. But they soon realized it would be inefficient and chaotic to work with 10 libraries at once. โThere would be lots of complexities that would come out by having too much blending going on,โ Green said.
Collins explained that structural differences between individual genomes โ such as large-scale insertions or deletions of genes โ can make it difficult to stitch together an accurate sequence from two different human sources. If you go from one person to 10, he said, โand then you try to fit the whole thing together, itโs going to be potentially much more error-prone.โ
It was primarily those technical challenges, Collins and Green said recently, that prompted the decision to derive most of the genome from a single donor. And RP11 โ with its well-sized fragments and comprehensive coverage of the genome โย stood out from the other libraries as the ideal one to work with, they said. Also, Green added, RP11 at the time was further along than any of the other new libraries in the process of being characterized and prepared for sequencing.
But Collinsโ and Greenโs recollections diverge in key ways from those of other scientists involved in the Human Genome Project. Robert Waterston, for instance, who was among the small circle of researchers who guided project strategy, recalls that the complexities of blending clone libraries were only a minor consideration. Yes, structural differences in DNA could complicate the task of meshing one personโs genetic sequence with anotherโs, he said, but only in certain regions of the genome, such as those marked by repeat sequences that differ in number and complexity from one person to the next.
The bigger factor, said Waterston, was time. And the Human Genome Project was pressed for time, he said, thanks to a man named J. Craig Venter.
In May 1998, the scientist Venter โ whose not-for-profit Institute for Genomic Research had done pilot work for the Human Genome Project โ launched a venture built to rival the publicly funded initiative. That June, Venter and his colleagues pledged in aย Science articleย that they would sequence a human genome by 2001 โ years ahead of the Human Genome Projectโs 2005 target deadline โ and at a fraction of the cost. The enterprise, known as Celera Genomics Group, set up shop in Rockville, Maryland, just miles from NHGRIโs Bethesda headquarters.
Correspondence from that time suggests the news lit a fire under the Human Genome Project. โObviously there would be significant political advantages to getting something out a year earlier than Venter is proposing, provided we can defend its utility,โ wrote Phil Green, an investigator at the University of Washingtonโs sequencing center, in an email that was shared with Collins shortly afterย word of Venterโs plansย began to spread.


Project members worried about the implications of a commercial enterprise owning, and possibly monetizing, the first human genome. For some of them, competition itself โ and the specter of a stinging defeat โย seemed to be motivation enough. In an email that September, NHGRIโs Peterson described Eric Lander โ who led the Whitehead/MIT Center for Genome Research, one of the five large centers that sequenced the majority of the genome โ as having called her โin a very depressed mood.โ Lander believed Venter would have a draft of the human genome โdone before next summer and will take continual pot shots at us,โ Peterson wrote. (Lee McGuire, chief communication officer at the Broad Institute, where Eric Lander is a member and founding director, told Undark that Lander was unavailable to be interviewed for this story.)In a move that wasย widelyย reportedย inย the mediaย as being prompted by the Celera announcement, Collins announced that September that the Human Genome Project would aim to finish its genome two years earlier than planned, by 2003, and release a working draft by 2001.
โWe came into this crush with Celera, and everything just had to get done as quickly as possible,โ recalled Waterston. The complement of libraries theyโd envisioned wasnโt ready yet, and it wouldโve taken time to make and distribute them, he said. They had to work with what they had, and what they had was RP11.
โThere just wasnโt an alternative,โ Waterston recalled. โWe didnโt have a second library to go to.โ
Marco Marra and John McPherson โ who along with Waterston did much of the preliminary characterization of clone libraries at Washington University โ similarly remember that it was the dearth of available libraries, more than the challenge of blending them together, that led the project to focus on a single donor.
That aligns with de Jongโs recollection. RP11 was a good library, he told Undark, but so were subsequent libraries he built. The problem was that there was no time to wait. (De Jong shared records with Undark indicating that his lab had not yet completed the second of its planned new libraries by September 1998, when the issues around RP11โs consent language arose; it is unclear whether the Caltech laboratory had completed and distributed the first of its planned new libraries to sequencing centers by that time, but Waterston recalls they hadnโt.)


Although de Jong said he was not heavily involved in discussions of sequencing strategy, he thinks it began to dawn on the scientists how much additional work, and money, would be required to prepare and sequence 10 libraries, rather than one or two. โThey couldnโt potentially keep up the same speed as Venter with his commercial effort if they would have stayed with the original plan,โ said de Jong. โSo I think it was mostly because they didnโt want to lose the race.โOther members of the Human Genome Project who spoke with Undark expressed similar sentiments, including one of its highest-ranking figures. โWe got pretty panicky that we were going to lose this,โ Patrinos said of the competition with Celera. โSo at that time, we had to follow paths that would get us to the conclusion as fast as possible.โ
Asked if he felt Celera contributed to a sense of urgency at that time, Collins told Undark he didnโt recall that being a factor โ that the rush, instead, was to get the job done to provide benefits for understanding health and disease. In a follow-up call, Collins clarified: โI think Celeraโs intentions to produce a for-profit human genome sequence was an issue that everybody was fully aware of, so that was in the air, if you will.โ But he said โit was not the driving factor at allโ in the decision to move as quickly as possible to obtain a complete public sequence.
In any case, on Oct. 27, 1998 โ five months after Venter launched his rival to the Human Genome Project, a month and a half after the project gave itself a new, ambitious deadline, weeks after Collinsโ concerned email about RP11โs consent language, and days after Collinsโ conference call with the chair of the Roswell Park IRB โ the ethics panel gave Collins and his team carte blanche to dramatically expand the use of RP11โs DNA, without telling any of the Roswell Park donors about the change.
That same month Simon and collaborator Hiroaki Shizuya โ having finished their first Caltech library under the new donor protection protocols โ told the DOEโs Marvin Frazier that although the group had genetic material in hand to begin a second library, they had been โinformed that there was no longer a great deal of interestโ in new libraries, and they were instead moving on to new research pursuits.
Archival correspondence suggests the turn of events didnโt sit well with all of the lead scientists involved in the project. โI was deeply distressed to have the director of a major genome center already start building the case that the informed-consent form for DNA used to build RPC-11 did not really mean what it said,โ wrote Olson in a November 1998 email to Collins and his University of Washington colleague Phil Green. The ethical, legal, and social issues related to the library sourcing will not go away, he predicted.
Speaking to Undark, Olson said he does not recall which consent language, or which director, he was referring to in his email. But he remembers there being tension between the ethicists and technical experts involved with the project. Some of the ethicists resented the idea that technical considerations should factor into discussions, he said, and โa lot of the more technically well-informed participants in the project just actually werenโt terribly interestedโ in the ethics issues.
Undark invitedย several biomedical ethicists and legal experts to review the Roswell Park consent form and the IRBโs ruling on RP11. Their responses called into question many of the justifications the ethics panel gave for its decision.
โThe big deal is that the 10% is not just a minor aspect of the consent form,โ wrote Hank Greely, a Stanford University Professor who works on ethical, legal, and social issues in the biosciences, in an email to Undark. Rather, he noted, it โis a substantial part of the argument about confidentiality.โ Greely said that he didnโt find any of the panelโs justifications convincing. He doesnโt think the IRB acted nefariously, but he said that he would not have so hastily dismissed the possibility of attempting to reconsent the volunteers, and that doing so wouldnโt necessarily have heightened the risks to the donor. โWeโve got these 10 names. Letโs see if theyโre in the phone book,โ he said, later adding, โletโs see how locatable they are.โ
Jonathan Moreno, a professor of medical ethics and health policy at the University of Pennsylvania who declined the offer to review documents but was briefed by Undark on the IRB decision, agreed that the volunteers should have been reconsented.
Appelbaum, the Columbia University legal and ethics specialist, was one of several experts who took issue with the panelโs interpretation of the 10 percent expectation. โI think a reasonable person would take away from that that the intent of the research team was to use no more than 10 percent of his or her genome in the project,โ he said. โAnd so playing with words in that way, I think, is really not appropriate in this context.โ

Appelbaum also thought it was odd for Collins, representing a sponsoring agency, to meet directly with an IRB chair on an ethical issue related to work the agency was sponsoring. There is a risk, he said, of exerting undue influence on the oversight process. Bruce Gordon, the assistant vice chancellor for regulatory affairs at the University of Nebraska Medical Center, told Undark that, generally speaking, โthe best practice would be that funders shouldnโt be interacting with the IRB under any circumstance,โ though he described it as an unspoken rule, and not a strict standard.Collins said he agreed the conference call was an unusual step, but that the significance of the situation justified it. โI counted on the IRB to do what they always do,โ he said, which is โto step back and take up a purely objective view of an ethical question and render their best opinion. I do not believe I put pressure on them at all.โ
Although ethicists and legal experts who spoke to Undark raised questions about the rationale of the IRBโs ruling, many said it was unlikely that RP11 had suffered concrete harms as a result โ a point also expressed by Collins and other key figures from the Human Genome Project. Protections enacted in the U.S. since the completion of the Human Genome Project make it illegal for employers or health insurers to discriminate based on a personโs genetic information. And experts say that without a matching DNA sample, it remains difficult to identify a person based solely on a genetic sequence. With a matching sample, however, it would be straightforward to identify the donor, whether their contribution was 70 percent or seven.
โI think itโs fair to say RP11 was probably misled about what was going to happen,โ said R. Alta Charo, a professor emerita of law and bioethics at the University of WisconsinโMadison. (Like Moreno, Charo declined the offer to review documents, but was briefed by Undark on the IRB decision.) The real question, however, said Charo, is whether the decision made him more identifiable, whether it exposed him to more risk. โI donโt know how to answer that question.โ
Appelbaum said it may be trueย that RP11โs risks werenโt substantially heightened by the decision to expand the use of his genetic sequence. โBut it seems to me that thatโs different from saying that the action wasnโt consequential,โ he said, โin the sense that it can be highly consequential, I think, for the research enterprise in this country to make promises to people in signed consent forms, and then violate those promises.โ
Appelbaum described the episode as exemplary of a long history of deceptions that have contributed to a lack of trust in the research enterprise, especially in minoritized communities. โOne of the big issues in human subjects research, which has assumed even greater salience in genomic research, has been the issue of trust,โ he said. โIf I agree to be in your project, are you leveling with me about whatโs going to happen to me? And if I agree to donate blood, or some other tissue sample, are you telling me the truth about how itโs going to be used?โ
The Juneย 2000ย White House ceremony that marked the Human Genome Projectโs sequencing milestone was a joint ceremony: At the presidential lectern that day, President Clinton was flanked on one side by Francis Collins and on the other by Craig Venter, whose Celera team was also nearing the finish line.
The following winter, the two teams each published landmark genome papers, with the Human Genome Projectโs report on itsย draft genome sequenceย officially appearing in the Feb. 15 issue of the prestigious journal Nature, and Celeraโsย sequencing resultsย appearing in the rival journal Science one day later.
Celera reported that its genome had been assembled from five unnamed donors, one of whom โ the majority donor โย Venterย later revealedย was himself.
Meanwhile, the Human Genome Project was circumspect about the donors behind its published sequence. Aย tableย in the Nature paper listed eight clone libraries that were described as having contributed the bulk of the sequence. Among them was RP11, which the table noted accounted for just over 74 percent of the draft genome. The other seven each contributed between 1.6 and 4.3 percent of the total. Additional libraries, neither named nor tallied in the paper, collectively accounted for the remaining 8.4 percent of the sequence.
The paper described the libraries as originating from anonymous DNA donors, according to a lottery-like process like the one used at Roswell Park. What was left unsaid โ but what consent documents, internal memos, and other records reviewed by Undark reveal โ is that six of the eight named libraries were the same ones that had raised ethical concerns early in the project: the library sourced from the 19-year-old cadaver; the libraries suspected to have been built with the DNA of project scientists; the libraries whose donors were known to project researchers. Collins and Patrinos had agreed in 1996 to let scientists use those libraries, provided the donors were properly consented, protocols were cleared by IRBs, and the libraries contributed minimally to the final sequence. (Caltechโs Simon told Undark that it was a lab technicianโs husband โ and not a postdoc, as had been rumored โ who produced the sperm from which one of his early libraries was built.)
Also left unsaid was that four of the eight libraries had all been derived from the same donor.
Collins and NHGRI director Green could not confirm to Undark how many, if any, of the libraries outside of the top eight had been approved by IRBs. Collins also told Undark he did not know if the family of the 19-year-old tissue donor had been reconsented in accordance with the 1996 guidelines.
Asked if he feels the project should have been more forthright in the 2001 paper about the sourcing of DNA donors, Collins said โitโs always good in hindsight to be transparent and forthright in every way. To be honest though, I donโt think in my view, that this was such a major substantial issue that it would have required a deep debate about exactly how to put that forward.โ He added, โI donโt believe that individuals were significantly put at risk by the way in which this was laid out. And I hope that doesnโt get lost.โ
To Appelbaum, however, the idea that the Human Genome Projectโs landmark paper may have misrepresented donor procedures is gravely concerning โ the kind of transgression that can erode public trust in science more broadly. Perhaps an argument could be made to defend the projectโs DNA sourcing, Appelbaum said, โbut Iโm not sure thereโs any argument on the other side about covering up what you did when you publish your results. I think youโve got to be open about that.โ
โIf you made certain decisions along the way,โ he said, โyou describe the decisions you made and the justification for them.โ
The culminationย of the Human Genome Project was, in a way, the beginning of a long scientific afterlife for RP11โs genetic sequence. Aย 2010 study, published in the journal Science, analyzed the reference genome andย concludedย that RP11 was of mixed African and European genetic ancestry, and likely identified as Black or African American.
Perhaps most consequential, however, is that the sequence that emerged from the human genome project has evolved into a foundational resource of modern genetics. It has been revised and improved through the years, each new edition, or reference assembly, augmented with new annotations and fixes.
Deanna Church, who led an international collaboration that managed the reference assemblies in the years following the Human Genome Projectโs completion, likens them to maps that give scientists a shared coordinate system for describing, comparing, and understanding genetic sequences. Researchers use them to interpret and identify fragments of DNA; clinicians and genetic testing companies use them as benchmarks to determine which genetic variants a person carries. The reference assembly that emerged from the Human Genome Project has become โthe foundation for all genomic data and databases,โ wrote the authors of aย 2019 opinion pieceย in the journal Genome Biology.
And to this day, the most widely used reference assemblies continue to deriveย more than 70 percentย of their sequence from a person who did not clearly consent to that level of use.



In recent years, Church and other experts have argued that it is time for a new reference model: The assemblies from the Human Genome Project do not adequately reflect the breadth of human genetic variation, they say. And although those reference assemblies are of exceptional quality by genome standards, a newer sequence, sourced from new DNA and known as theย telomere-to-telomere assembly, is both more accurate and more comprehensive.
But a reference assemblyโs usefulness stems in large part from the information, annotations, and standards that are built on top of it, and it will take time for scientists to duplicate that infrastructure for a new reference genome.
Leslie Biesecker, chief of the Center for Precision Health Research at the NHGRI, estimated that it will be three to five years before the community transitions to a new reference: โThere are so many pieces of machinery that need to be moved forward at the same time in order for that whole system to work.โ
Stanfordโs Greely, a lawyer by training, said itโs conceivable that were RP11 to learn of the outsized role his DNA played in genetic science, he might seek financial compensation. โWithout wanting to get into the merits of the claims, it could play out kind of the way the Henrietta Lacks story has,โ said Greely, referring to a Black woman who died of cervical cancer in 1951, and whose cells were harvested for science without her consent. (Lacksโ family members were recently awarded anย undisclosed settlementย from Thermo Fisher Scientific, over allegations the company unjustly profited from her cells.) โIf I were NIH, I would worry โ hey, if this guy knows, he might sue us or make trouble for us,โ Greely said.
Documents suggest the architects of the Human Genome Project worried about just such a scenario: a clause in the original consent form used at Roswell Park asserted that, by signing, a donor waived their โrights to claim any part of conceivable profits resulting from research performed on the blood and products derived from the blood you donated.โ But emails sent to NHGRI leadership in July 1997 indicate that when Department of Health and Human Services officials learned of the clause, they argued it ran afoul of aย federal regulation that bars consent language that could be construed as a waiver of legal rights. Although RP11 had likely already signed the original version, the waiver was removed from the consent form by that August.
These days,ย the trim beard Pieter de Jong wore during the days of the Human Genome Project has turned to gray. He now lives near Seattle, where he still runs a small clone library supply operation. This year, to free up space, he finally destroyed three of the five clone libraries he built for the Human Genome Project โ two of which he said the project never used, and a third that was incorporated into the reference sequence only in the genomeโs later revisions.
De Jong no longer knows the whereabouts of the 20 consent forms that were collected from the Roswell Park volunteers โ the only known records that identify the participants by name. Although study protocols stipulated that Roswell Park staff would maintain a chain of custody for the forms, Annie Deck-Miller, director of public relations at the center, now known as the Roswell Park Comprehensive Cancer Center, told Undark in an email that the facility no longer possesses any forms related to de Jongโs study. In a subsequent emailed statement, representatives of Roswell Park indicated that documents related to the Human Genome Project were stored onsite โfor a number of years, as required by federal regulations.โ They declined to comment further, however, citing a lack of capacity โto engage in a review of decisions purported to have taken place in a confidential meeting conducted 26 years ago.โ Collins and Green say they have never attempted to notify Roswell Park donors about the change to the sequencing plan, and that the IRB decision does not permit them to.


There is, however, one Human Genome Project donor whose whereabouts de Jong knows precisely: the person behind the four clone libraries that accounted for more than 9 percent of the draft sequence.
De Jong recalls that he and a visiting collaborator created those libraries in the summer of 1993. They did it quickly โ he was in a hurry to apply for grants โand get something goingโ โ and he said there were few ethical guardrails to guide them. De Jong felt it would be inappropriate to solicit DNA from one of his lab workers, โso my collaborator โ my visitor โ and me, we exchanged, we both tossed up and we gave blood samples for the project.โ
One of those samples yielded clone libraries that helped spark the 1996 panic over donors: libraries whose origins project leaders worried might leak to the press, de Jong said, but that nonetheless found their way into the worldโs first human genome sequence.
โIt ended up being me,โ de Jong said, matter-of-factly. โThe reference genome is maybe 80 percent or 75 percent RP11, and maybe 10 percent me.โ
Ashley Smart is the associate director of the Knight Science Journalism Program at MIT, and a senior editor at Undark. Find Ashley on X @ashleythesmart
A version of this article was originally posted at Undark and is reposted here with permission. Any reposting should credit both the GLP and original article. Find Undark on X @UndarkMag





















