|Library of Congress Web Archive Minerva|
The Online Library of Liberty website has been selected by the Library of Congress to be archived as part of their Minerva project to collect and preserve material "of historical importance to the Congress and to the American people." The OLL website will join a select group of websites which will be cataloged and preserved for future generations of researchers. It will be part of the "Single Sites" collection of thematic or event-based web sites - presumably our theme of "liberty" is what caught their attention.
This is the email we received outlining some of the aims of the LOC Web Archiving Project:
To Whom It May Concern:
The United States Library of Congress has selected your Web site for inclusion in its historic collections of Internet materials. The Library's traditional functions, acquiring, cataloging, preserving and serving collection materials of historical importance to the Congress and to the American people to foster education and scholarship, extend to digital materials, including Web sites. The following URL has been selected for archiving:
We request your permission to collect your web site and add it to the Library's research collections. In order to properly archive this URL, and potentially other URLs of interest on your site, we would appreciate your permission to archive both this URL and other portions of your site. With your permission, the Library of Congress or its agent will engage in the collection of content from your Web site at regular intervals over time and make this collection available to researchers both onsite at Library facilities and though the Library's public Web site http://www.loc.gov/webarchiving/.
By special arrangement, the Library may also make this collection available to scholarly research institutions for web archive research. The Library hopes that you share its vision of preserving Internet materials and permitting researchers from across the world to access them...
Our Web Archives are important because they contribute to the historical record, capturing information that could otherwise be lost. With the growing role of the Web as an influential medium, records of historic events could be considered incomplete without materials that were "born digital" and never printed on paper. For more information about these Web Archive collections, please visit our Web site (http://www.loc.gov/webarchiving/).
Additional information about the Minerva Project can be found here:
The Library of Congress Web Archives (LCWA) is composed of collections of archived web sites selected by subject specialists to represent web-based information on a designated topic. It is part of a continuing effort by the Library to evaluate, select, collect, catalog, provide access to, and preserve digital materials for future generations of researchers. The early development project for Web archives was called MINERVA.
Web Archives Available:
- Crisis in Darfur, Sudan, Web Archive, 2006
- Iraq War 2003 Web Archive
- Law Library Legal Blawgs Web Archive
- Library of Congress Manuscript Division Archive of Organizational Web Sites
- Papal Transition 2005 Web Archive
- September 11, 2001 Web Archive
- Single Sites Web Archive
- United States 107th Congress Web Archive
- United States 108th Congress Web Archive
- United States Election 2000 Web Archive
- United States Election 2002 Web Archive
- United States Election 2004 Web Archive
- United States Election 2006 Web Archive
- Visual Image Web Sites Archive
On the LOC's policy on web archiving see this page:
The Library of Congress preserves the nation's cultural artifacts and provides enduring access to them. The Library's traditional functions of acquiring, cataloging, preserving and serving collection materials of historical importance to the Congress and the American people to foster education and scholarship extend to digital materials, including Web sites.
In 2000, the Library of Congress established a pilot project to collect and preserve these primary source materials. A multidisciplinary team of Library staff representing cataloging, legal, public services, and technology services studied methods to evaluate, select, collect, catalog, provide access to, and preserve these materials for future generations of researchers. The Library developed thematic Web archives on such topics as the United States National Elections of 2000, 2002, and 2004, the Iraq War, and the events of September 11. More about these collections plus many other available collections can be found at the Library of Congress Web Archives Web site.
In July 2003, the Library and the national libraries of Australia, Canada, Denmark, Finland, France, Iceland, Italy, Norway, Sweden, the British Library (UK), and the Internet Archive (USA) acknowledged the importance of international collaboration for preserving Internet content for future generations and formed the International Internet Preservation Consortium. The goals of the Consortium include collecting a rich body of Internet content from around the world and fostering the development and use of common tools, techniques and standards that enable the creation of international archives.
In 2004, the Library’s Office of Strategic Initiatives created a Web Archiving team to support the goal of managing and sustaining at-risk digital content. The team is charged with building a Library-wide understanding and technical infrastructure for capturing Web content. The team, in collaboration with a variety of Library staff, and national and international partners, is identifying policy issues, establishing best practices and building tools to collect and preserve Web content.
The team has completed several Web archive collections and continues to work on new projects for building Web archives.
Scope: The Single Sites Web Archive contains sites covering a diverse array of topics selected by recommending librarians from the Library of Congress. This growing archive currently focuses on military history (Civil War, World War II, etc.) and African-American history and culture. Other topics currently include numismatics, Hungary, immigration, charitable organizations, and nanotechnology.
Included in the web archive are blogs, individual web pages, educational sites (including virtual exhibitions), and organizational sites.
This collection is part of a continuing effort by the Library of Congress to evaluate, select, collect, catalog, provide access to, and preserve digital materials for future generations of researchers.
There is a FAQ section which nexplains a few more details about the project here:
About Web Archiving Activities at the Library of Congress
1. Why is the Library of Congress collecting and creating an archive of Web sites?
The Library of Congress and libraries and archives around the world are interested in collecting and preserving the Web because an ever-increasing amount of the world’s cultural and intellectual output is created in digital formats and does not exist in any physical form. Creating an archive of Web sites supports the goals of the Library’s Digital Strategic Plan, announced in March 2003, which focuses on the collection and management of digital content...
3. How large is the Library’s archive?
As of February 2010, the Library has collected almost 160 terabytes of data.
4. What kinds of Web sites does the Library archive?
Library of Congress recommending officers, or curators, select a variety of Web sites to archive, depending on the theme of the collection activity. The Library’s MINERVA project was the initial pilot project to archive web sites. Event-based or thematic collections publicly available through the Library of Congress Web Archives Web site include Election 2002, September 11, Election 2004, and the 107th Congress Web archive.
Categories of sites archived include, but are not limited to: United States government (federal, state, district, local), foreign government, candidates for political office, political commentary, political party, media, religious organizations, support groups, tributes and memorials, advocacy groups, educational and research institutions, creative expressions (cartoons, poetry, etc.), and blogs....
How Web Archiving Works
3. How much of a Web site is collected?
The Library’s goal is to create an archival copy – essentially a snapshot -- of how the site appeared at a particular point in time. Depending on the collection, the Library archives as much of the site as possible, including html pages, images, flash, PDFs, audio, and video files, to provide context for future researchers. The Heritrix crawler is currently unable to archive streaming media, "deep web" or database content requiring user input, and content requiring payment or a subscription for access. In addition, there will always be some Web sites that take advantage of emerging or unusual technologies that the crawler cannot anticipate.
4. Do you archive all identifying site documentation, including URL, trademark, copyright statement, ownership, publication date, etc.?
We attempt to completely reproduce a site for archival purposes.
Last modified April 13, 2016