Today we released our archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov. It will be updated daily as new datasets are added to data.gov.
This is the first release in our new data vault project to preserve and authenticate vital public datasets for academic research, policymaking, and public use.
We’ve built this project on our long-standing commitment to preserving government records and making public information available to everyone. Libraries play an essential role in safeguarding the integrity of digital information. By preserving detailed metadata and establishing digital signatures for authenticity and provenance, we make it easier for researchers and the public to cite and access the information they need over time.
In addition to the data collection, we are releasing open source software and documentation for replicating our work and creating similar repositories. With these tools, we aim not only to preserve knowledge ourselves but also to empower others to save and access the data that matters to them.
For suggestions and collaboration on future releases, please contact us at publicdata@law.harvard.edu.
This project builds on our work with the Perma.cc web archiving tool used by courts, law journals, and law firms; the Caselaw Access Project, sharing all precedential cases of the United States; and our research on Century Scale Storage. This work is made possible with support from the Filecoin Foundation for the Decentralized Web and the Rockefeller Brothers Fund.
activism
activists
advocacy
advocate
advocates
barrier
barriers
biased
biased toward
biases
biases towards
bipoc
black and latinx
community diversity
community equity
cultural differences
cultural heritage
culturally responsive
disabilities
disability
discriminated
discrimination
discriminatory
diverse backgrounds
diverse communities
diverse community
diverse group
diverse groups
diversified
diversify
diversifying
diversity and inclusion
diversity equity
enhance the diversity
enhancing diversity
equal opportunity
equality
equitable
equity
ethnicity
excluded
female
females
fostering inclusivity
gender
gender diversity
genders
hate speech
excluded
female
females
fostering inclusivity
gender
gender diversity
genders
hate speech
hispanic minority
historically
implicit bias
implicit biases
inclusion
inclusive
inclusiveness
inclusivity
increase diversity
increase the diversity
indigenous community
inequalities
inequality
inequitable
inequities
institutional
Igbt
marginalize
marginalized
minorities
minority
multicultural
polarization
political
prejudice
privileges
promoting diversity
race and ethnicity
racial
racial diversity
racial inequality
racial justice
racially
racism
sense of belonging
sexual preferences
social justice
sociocultural
socioeconomic
status
stereotypes
systemic
trauma
under appreciated
under represented
under served
underrepresentation
underrepresented
underserved
undervalued
victim
women
women and underrepresented
After a federal judge ruled that the Food and Drug Administration, the Centers for Disease Control, and the Health and Human Services departments must reinstate web pages containing vital information about trans people and other marginalized groups, the Trump Administration capitulated — however, officials added a highly misleading disclaimer to them.
Today, the National Security Agency (NSA) is planning a "Big Delete" of websites and internal network content that contain any of 27 banned words, including "privilege," "bias," and "inclusion." The "Big Delete," according to an NSA source and internal correspondence reviewed by Popular Information, is creating unintended consequences. Although the websites and other content are purportedly being deleted to comply with President Trump's executive orders targeting diversity, equity, and inclusion, or "DEI," the dragnet is taking down "mission-related" work. According to the NSA source, who spoke on the condition of anonymity because they are not authorized to speak to the media, the process is "very chaotic," but is plowing ahead anyway.
A memo distributed by NSA leadership to its staff says that on February 10, all NSA websites and internal network pages that contain banned words will be deleted. This is the list of 27 banned words distributed to NSA staff:
Anti-Racism
Racism
Allyship
Bias
DEI
Diversity
Diverse
Confirmation Bias
Equity
Equitableness
Feminism
Gender
Gender Identity
Inclusion
Inclusive
All-Inclusive
Inclusivity
Injustice
Intersectionality
Prejudice
Privilege
Racial Identity
Sexuality
Stereotypes
Pronouns
Transgender
Equality
The memo acknowledges that the list includes many terms that are used by the NSA in contexts that have nothing to do with DEI. For example, the term "privilege" is used by the NSA in the context of "privilege escalation." In the intelligence world, privilege escalation refers to "techniques that adversaries use to gain higher-level permissions on a system or network."
The purge extends beyond public-facing websites to pages on the NSA's internal network, including project management software like Jira and Confluence.
The NSA is trying to identify mission-related sites before the "Big Delete" is executed but appears to lack the personnel to do so. The NSA's internal network has existed since the 1990s, and a manual review of the content is impractical. Instead, the NSA is working with "Data Science Development Program interns" to "understand the false-positive use cases" and "help generate query options that can better minimize false-positives." Nevertheless, the NSA is anticipating "unintended downtime" of "mission-related" websites.
While Trump's executive order claims to target "illegal and immoral discrimination programs," the NSA's banned-word list demonstrates that the implementation is far broader. The Trump administration is attempting to prohibit any acknowledgment that racism, stereotypes, and bias exist. The ban is so sweeping that "confirmation bias" — the tendency of people "to accept or notice information if it appears to support what they already believe or expect" — is included, even though it has nothing to do with race or gender.
Since Trump took office, thousands of web pages across various federal agencies have been altered or removed entirely. Federal agencies have taken down or edited resources about HIV, contraceptives, LGBTQ+ health, abortion, and climate change. Some web pages have later come back online “without clarity on what had been changed or removed.”
An analysis by the Washington Post of 8,000 federal web pages “found 662 examples of deletions and additions” since Trump took office. The analysis found that words like diversity, equity, and inclusion were removed at least 231 times from the websites of federal agencies, including the Department of Labor, the Department of Education, the Department of Health and Human Services, and the Department of Transportation.
One example included a job listing page for the Department of Homeland Security that removed language about maintaining an “inclusive environment.” The Post also found examples of words being removed that had nothing to do with DEI, such as a page on the Department of the Interior’s website that boasted of its museums' “diverse collections,” removing the word “diverse.”
Following Trump’s executive orders targeting transgender individuals, multiple federal websites have removed transgender and intersex people from the acronym “LGBTQI,” NBC News reported. On the State Department website, a web page that used to provide resources for “LGBTQI Travelers” now addresses “LGB Travelers.” The Social Security Administration has made similar changes, with a page heading now reading “Social Security for LGBQ People.” Some agencies, including the Department of Education, have removed web pages with LGBTQ resources altogether.
On X, Elon Musk's United States DOGE Service is celebrating the deletions:
We started a new publication, Musk Watch. NPR covered our launch HERE. It features accountability journalism focused on one of the most powerful humans in history. It is free to sign up, so we hope you’ll give it a try and let us know what you think.
Federal agencies have also been scrubbing websites for mentions of climate change, which Trump has called a “hoax.” The Department of Agriculture’s Office of Communications issued a directive to “archive or unpublish any landing pages focused on climate change,” the Guardian reported. Resources on the Forest Service website, including the Climate Change Resource Center and the Climate Action Tracker, appear to still be unavailable. The Department of Transportation website replaced the phrase “climate change” with “climate resilience.”
Among the agencies with the most deleted web pages is the Centers for Disease Control and Prevention (CDC), which took down over “3,000 pages,” according to the New York Times. In one example, data from the CDC’s Youth Risk Behavior Survey, which tracks important health metrics, was temporarily unavailable, only to come back online later with “at least one of the gender columns missing and its data documentation removed.” A banner on the top of the CDC website states it is “being modified to comply with President Trump’s Executive Orders.”
Last week, the Trump administration was sued by Doctors for America, a physicians' group, for removing health resources and data from government websites, arguing that it "deprived clinicians and researchers of tools necessary to treat patients.”