Making Archives More Explorable: The Need for Additional Descriptive Metadata

Abstract

Through the digital archival exploration of cookbooks, I explore metadata tagging practices of the Wellcome Collection and the Folger Shakespeare Library LUNA Collection. I argue that current practices in applying metadata to archival texts continue to obscure the same underrepresented voices that critics with a focus on the writing of women and people of color are trying to uplift. Thus, in order to further progress this ethical project and the pursuit of making archival materials more generally accessible, archives also need to attune to the study of metadata writing and website design. The cookbooks I use as the cornerstones of my exploration are the MS.8903 “Manuscript recipe book of Grace Carteret, 1st Countess Granville” from the Wellcome Collection in addition to V.a.8 “Cookbook of L. Cromwell” from the Folger Shakespeare Library LUNA Collection. In this exploration, I reference the Wellcome Collection’s stated goals for their metadata and its searchability. Additionally, I reference guidelines that are already developed as well as guidelines that are being developed by relevant metadata and archival studies organizations in order to describe where the field is at and where the field is going in addressing this larger issue. Ultimately, I hope to encourage the revision of metadata practices to improve digital archive usability and to prevent obscuring of texts due to poor metadata practices.

Introduction

One of the first impediments to research that starts in the exploration of a digital archive is the inability to identify the “correct” term or terms for the sought out item to call in an archive’s search bar. When I first began exploring digitized 16-18th century cookbooks, I often had a lot of trouble identifying a library’s total collection of cookbooks. Because I had access to an archival scholar who was familiar with old cookbooks, Dr. Marcy North, as I am in a graduate seminar as a graduate student at the Pennsylvania State University, I quickly learned from Dr. North that a big part of why I could not find these texts is because these texts were not all called cookbooks at their times of writing, but rather they were called receipt or recipe books. Critically, the way the texts were tagged in their metadata reflected this terminology distinction as many of the archives that the class interacted with online did not tag “receipt books” as the more contemporarily colloquial term “cookbooks,” or vice versa. To be clear, I am referring to metadata as described by the American Library Association: “Metadata is data about data. It is descriptive information about a particular data set, object, or resource, including how it is formatted, and when and by whom it was collected.”1 Hopefully I would have eventually figured out this distinction on my own without the instruction of a scholar with some know-how. However, not everyone who could want to explore an archive for research or personal curiosity reasons will have the sort of vocabulary necessary to find the texts they are searching for, and correcting the terminology end of this issue would require more work and education from the researcher-user’s end. While this work and education would obviously be worthwhile, that work and education is not always accessible or feasible, and neither is a well-versed scholar always easy to find to ask relevant clarifying questions. Therefore, there is a need for a change in how items in a digital archive are tagged in order to make them truly accessible to the variety of people who may want to explore and study these items.

Further, the ethical issues connected to the obscuring of texts has been actively discussed in archival studies to the end of making underrepresented voices like women and people of color more visible in the archives. Inaccessible metadata tagging practices can impede this ethical project in addition to the better explored impediments to making underrepresented voices better represented like what gets excluded in processes like canonization, preservation practices, and collection-building. For instance, 16th-18th century cookbook manuscripts are commonly used by as well as written, or dictated, by women. If what a 21st century person would call a cookbook manuscript is obscured in the archive because the manuscript is only tagged as a receipt book, that 21st century person cannot get a clear understanding of the manuscripts that are available and thus women’s writing and manuscripts relevant to women of the 16th-18th centuries’ lives are obscured. While the cookbook-receipt book conundrum may be a singular example, I suspect that it is not an isolated example of this sort of terminological obscuring as language, genres, and labels are constantly evolving. Succinctly, in order to thoughtfully accommodate the evolution of language that does and will continue to create discrepancies between what items were once called and what those same items would be called now, archives should attempt to design their metadata in ways that would label archival items as the multitude of terms the items could be called in order to make the archives more explorable.

As the Folger Shakespeare Library LUNA Collection and the Wellcome Collection house many digitized and non-digitized 16th-18th century cookbooks, and their metadata practices are exemplary of the obscuring that I have problematized, they serve as relevant examples to this discussion. I examine V.a.8 “Cookbook of L. Cromwell [manuscript]” from the Folger LUNA Collection and MS.8903 “Manuscript recipe book of Grace Carteret, 1st Countess Granville (1654-1744)” from the Wellcome Collection to describe how these two separate archives design their items’ metadata and their websites’ search functions. Further, the Wellcome Collection has documented their approach to making their collections more searchable and thus accessible, primarily focusing on the design of their website’s search bar functionality. The work that the Wellcome Collection has done is definitely worthwhile and an important consideration in making their materials more searchable and discoverable. However, because the problems that instigated my line of inquiry still exist after several years of development, I argue that archives like the Wellcome Collection and the much less transparent Folger LUNA should also approach this issue via their metadata tagging practices. The metadata tagging practices of digital archives are currently in flux due to developing scholarship about how to best design metadata, search functions, and archival websites as a whole. Consequently, I address the current understanding of the problems with archival metadata tagging and proposed potential solutions to these issues in addition to my own proposed potential solution. Altogether, I aim to add to this much broader conversation by intervening in the particular issue of archival items being obscured by variation in terms across time and the uneven application of metadata categories and subcategories.

The Folger Shakespeare Library’s LUNA Collection and the Wellcome Collection’s Metadata

Folger Shakespeare Library’s LUNA Collection

To begin, the Folger Shakespeare Library’s LUNA Collection’s search capabilities indicate that there are 26 possible metadata categories in addition to a text search for text documents; additionally, each item can be tagged with which of the eight collections in the broader LUNA collection it falls under. 17 of these possible metadata categories are associated with Hamnet, the name of the entire Folger Shakespeare Library’s catalog, and there is some overlap between the Hamnet metadata categories and non-Hamnet metadata categories. The list of the 26 possible metadata categories is as follows: Associated Name (Hamnet), Call Number (Hamnet), Citations (Hamnet), Creator (Hamnet), Date of Creation or Publ. (Hamnet), Digital Image File Name, Digital Image File Type, Edition (Hamnet), Event Title, Folger Holding Notes (Hamnet), Hamnet Bib ID, Hamnet Holdings ID, Image Details, Notes (Hamnet), Physical Description, Physical Description (Hamnet), Place of Creation or Publ. (Hamnet), Provenance (Hamnet), Publisher (Hamnet), Source Call Number, Source Created or Published, Source Creator, Source Title, Subject (Hamnet), Title (Hamnet), and Uniform Title (Hamnet). 

To look at a specific example of the metadata tagging of a digitized item in the Folger LUNA Collection, LUNA’s V.a.8 is tagged with the title “Cookbook of L. Cromwell [manuscript].” V.a.8’s digitized images of the folios are tagged unevenly with 14 of these aforementioned metadata categories. The nine metadata tags that are associated with every page in the manuscript includes Source Call Number (V.a.8), Source Creator (“Cromwell, L., 17th cent.”), Source Title (“Cookbook of L. Cromwell [manuscript]”), Source Created or Published (“17th century”), Digital Image Type (“FSL collection”), Creator (Hamnet) (“Cromwell, L., 17th century.”), Physical Description (“1 v.”), Subject (Hamnet) (“Cookbooks – 17th century – Manuscripts”), and Call Number (Hamnet) (V.a.8). V.a.8 demonstrates some of the overlap of metadata tagging in the Folger Shakespeare Library’s practices since pairs Source Creator – Creator (Hamnet) and Source Call Number – Call Number (Hamnet) are tagged with exactly the same phrasing.

The list of metadata tag categories that an item could be tagged with in the Folger LUNA Collection indicates the possibility of a relatively in-depth collection of metadata tags for thoroughly describing an item with metadata and therefore making it searchable and accessible via that in-depth metadata. However, as evidenced by the unthoroughly applied metadata to V.a.8, there is clearly a discrepancy in metadata tagging possibilities and metadata tagging applications. Notably to the crux of my argument, V.a.8 does not appear on a search for “recipe book”2 or “receipt book.”3 Thus, anyone familiar with those terms searching with one or the other term would not find V.a.8 without also searching with the term “cookbook.” 

Wellcome Collection

In terms of search capabilities, the Wellcome Collection has a variety of filters or metadata categories to utilize for a researcher’s specific needs once they get past the initial general catalog search bar. These filters include broad categories of “Stories,” “Images,” and “Catalogue” to select from the variety of content that the library both creates and houses. When narrowing down a search to look for something more specific in the catalog, metadata filters include formats (with 17 categories such as “Books” and “Archives and manuscripts”), dates in a range, locations (“Open shelves”, “Online”, and “Closed stores”), subjects (which are wide-ranging and narrow as other filters are applied), types/techniques (with at least 20 categories such as “Ephemera” and “Poems”), contributors (rather than exclusively authors), and languages. These filters can be applied either individually or altogether within the Wellcome Collection’s “All filters” menu. Additionally, this online search engine allows someone to search for both digitized and undigitized items within the Wellcome Collection’s collections. Altogether, the Wellcome Collection has eight metadata categories, if the broad categories of Stories/Images/Catalogue are included in those categories.

For an item in the collection, the Wellcome Collection includes additional descriptive information in addition to the tagging that their system does for their search interface. For descriptive purposes of the Wellcome Collection’s metadata practices, MS.8903, the “Manuscript recipe book of Grace Carteret, 1st Countess Granville (1654-1744),” is a relatively well tagged and described archival item. At the top of the item screen, the interface includes the given title of the work (“Manuscript recipe book of Grace Carteret, 1st Countess Granville (1654-1744)”), the identified date (“1662-mid 18th century”), and the reference number (MS.8903). Additionally, the top of the page identifies the format tag (“Archives and manuscripts”) and the online tag if applicable, as MS.8903 is here since this manuscript is completely digitized. Next comes a section entitled “About this work” which includes a description that can include, at least in this item’s case, background information as to the context of the item’s creation, description of the manuscript’s hand, and an arrangement of the recipes included in the manuscript. Notably, as previously mentioned, this item is well described due to the cited efforts of Gwenneth Heyking’s transcription in addition to uncited efforts of archivists that outlined the manuscript’s relation to other manuscripts, so its description is more in-depth than other manuscript cookbooks in the Wellcome Collection. However, this description is not translated into metadata as well and thus does not appear in searches. Other components of this section include the publication/creation date (“1662-mid 18th century”), a physical description of the material object (“1 volume The manuscript is a 4to volume (20 x 32 cm) comprising 107 folios. The original page numbering runs from p.43-p.278, with 13 unnumbered pages at the back, several pages lacking (pp.77-78, 157-190, 213-214, 223, 277) and two consecutive pages numbered 227”), acquisition notes (“Purchased from the Herb Society, June 2013”), related material (“MS.7113 Lady Ann Fanshawe’s manuscript recipe book”), locations of duplicates (“A digitised copy is held by Wellcome Collection”), and ownership notes (“The volume acquired a mythical status among food historians following a 1979 article by Elizabeth David (‘Hunt the Ice Cream’, Petits Propos Culinaires 1 (1979): 83), but subsequently disappeared from public access and was thought to have been lost or destroyed”). Again, much of this relevant information that would be helpful to be searchable is not searchable because the information is not reflected in metadata. Additionally, the site includes a list of the subject tags for more searchable metadata (“Recipe book”, “Recipe books”, “Ice cream”, “Weights and measures”, “Butter”, “Tonics (Medicinal preparations)”, “Cheese”, “Dairy products”, “Dried fruit”, “Fruit drinks”, “Jam”, “Wine”, “Alcohol”, “Fish”, “Meat”, “Cooking (Poultry)”, “Game and birds”, “Pickles”, “Ointments”, “Cake”, “Biscuits”, “Alopecia”, “Pies”). The last few sections of the item’s webpage includes information on where to find it both in the physical library and for online access (“Location: Closed stores”, “Status: By appointment”, “Access: Manual request”), a permanent url (https://wellcomecollection.org/works/jm3g84uj), and identifiers (“Accession number: 1991”).4

While MS.8903 is thoroughly described, the only metadata tags are format, location, contributors, and subjects while the other descriptive information is not searchable. Further, notably for my primary critique of the Wellcome Collection and other digital archives’ metadata tagging practices, while MS.8903 is expansively tagged with various subject tags, neither “cookbook” or “receipt book” are among them, thus confounding searches for folks who would refer to this item as either of those terms. If one searches for “cookbook” in the Wellcome Collection’s search bar, MS.8903 does not appear in the four pages of results because it does not include a tag for the contemporary word for recipe book, though the manuscript certainly should appear in that search.

Metadata Tagging Suggested Best Practices

Because of the ever-increasing amount of interest in digitized archival library catalogs and their contents, scholarship on metadata best practices for this purpose and others has puzzled over finding ways to make complicated-to-categorize archival items accessible and archive catalogs generally more usable. Building an appropriate balance between thoroughly describing archival items with metadata in order to make them searchable to users’ needs and avoiding complicating users’ searches with irrelevant search hits can be one of the ways that archivists can address the evolving nature of digital archives. In a survey of metadata practices of various American university libraries done by the Digital Library Federation, among the group’s initial conclusions was a clear gap in knowledge and practice when it came to evaluating metadata quality and thus metadata creation amongst surveyed librarians, with the critical caveat that this lack is likely heavily related to a significant lack of the staff and time necessary to improve the situation. Further, the study also concluded that there are many metadata schemas being applied to digital archives and thus there is not a single common recognized way to design and apply metadata.5 The importance of bolstering staff and staff capabilities cannot be stressed enough when it comes to making digital archives more accessible, and more practically and broadly, continue to exist at all. However, if a collective understanding of how to best design archival metadata could be established, perhaps the consequences of poorly applied metadata can be at least somewhat circumnavigated even amongst underfunded, understaffed, and undertrained archives and libraries and their staffs.

Relatedly, one of the primary initiatives to create a common understanding of the design of archival databases and digital archives is the Text Encoding Initiative (TEI). TEI defines itself as “a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics.”6 TEI has made sets of guidelines for specific types of archival materials, including the particularly relevant manuscript description guidelines. The guidelines are designed to be flexible because of the diversity of archival items that would fall under the category of “manuscript.”7 TEI provides guidelines for tagging with metadata in-depth information for the physical and material qualities of archival manuscripts as well as the history of the item and certain categorical information like title and author. However, missing from these guidelines are instructions about how to tag for descriptive information like genre or style of the content in the item. TEI cites the reasoning for this as “The P5 revision is even more radically modular; it retains the notion of a core module with essential common elements, and considers all further tagsets as additional modules which can be combined, modified, and trimmed to suit the user’s needs,” emphasizing modular customizability rather than specific ubiquitous best practices.8 Consequently, as is evidenced by the massive discrepancies in the amount and quality of metadata across different archives and even within a single archive, archives that use TEI as their primary guide for appropriate metadata design would not necessarily have instructions on how to best catalog archival items for digital searchability and thus would tend towards scant descriptive metadata which would create more problems than the specific problem I address here. Perhaps to further address the variety of issues with subject metadata TEI should consider another revision with best practices or, as aforementioned, whichever metadata schema is determined to be most relevant to archives’ needs should address TEI’s lack of suggestions.

That is not to say that archival studies more broadly have not attempted to develop best practices for the obscuring of uncomprehensively tagged archival items as it has been an issue for as long as digital archives have existed. Lise Jaillant addresses many aspects of obstacles to access in both digitized and born-digital collections far outside the scope of this paper in her 2022 study “How can we make born-digital and digitised archives more accessible? Identifying obstacles and solutions.”9 To address a lack of metadata, as is generally the case with hard-to-find archival items, Jaillant proposes the use of Artificial Intelligence and Machine Learning (AI/ML) software in order to generate metadata categories that could help describe archival items to then make them more searchable. Of course, in order to appropriately apply this software, a librarian or archivist would have to have an understanding of how that software works or a computer scientist would have to have an understanding of how archival materials can and should be described and cataloged. Consequently, Jaillant advocates for collaboration across disciplines in order to be able to address this issue of metadata design and the other issues she describes, a stance that I wholeheartedly second.10 Ultimately, someone who only understands how to design metadata and someone who only understands how to design descriptions of archival materials cannot solve these issues alone, but they could in conjunction with each other.

More specifically, the Wellcome Collection clearly has considered this problem of difficult to find archival materials, though they have chosen to address the problem via the development of their website’s search function. The development of the website’s search function was initiated because of problems with the Wellcome Collection’s former system of having multiple separate search tools for separate types of searches across separate systems. In their article on their reasoning for designing a single new search function, “Building Our New Unified Collections Search,” Alex Chan, a software engineer for Wellcome, describes how multiple host systems housed different components of the total digital collection and catalog, and thus different search functions were created to more appropriately search each of those different collections since the different systems were cataloged with different types of metadata. In the piece, Chan explains that in the process of making a unified collections search, the data was copied over as “read-only” without making any adjustments to the metadata because, as they state, “doing so would be massively disruptive, and add very little to this process.” Chan does state that Wellcome is, as of 2021, trying to actively transform how items in the collection are described at the metadata level in order to better integrate items from their various original systems with separate metadata tagging strategies into their single unified system.11 Thus, perhaps my critique of how the Wellcome Collection is going about making their system more searchable is simply early since the work may still be in progress. 

However, as detailed a month later by Harrison Pim’s “Making Wellcome Collection Searchable,” it seems as if the Wellcome Collection is addressing the issue at the system design level rather than at the descriptive cataloging and metadata tagging level. In Pim’s discussion of the user feedback that guides the Wellcome Collection’s fine-tuning of their unified search function, the Wellcome Collection team decided to take an approach to address this issue with what they dub as “intentions” which are “all of our ‘I searched for x, expecting to see y’ examples” that are then grouped into common themes.12 Summarily, they have addressed the accessibility and searchability issues that this paper has identified as a product of problems with the design of the website’s search function rather than a product of how their archivists and other staff have gone about tagging the items in their collection. However, after a couple years of attempting to address this issue, their catalog still does not account for jargon-informed as well as colloquial terms to search for the same type of item as previously discussed with the cookbook/receipt book/recipe book conundrum. Thus, the Wellcome Collection should address how items in their catalog and, consequently, their unified system are described by their catalogists and other staff to therefore better inform the metadata that each item is tagged with. A revised approach to cataloging in conjunction with the Wellcome Collection’s thoughtful unified search bar design could effectively address the accessibility issues with finding items in their catalog discussed in their paper, and thus the Wellcome Collection could serve as an example of how to best approach these issues from multiple angles.

Conclusion

Altogether, the problems with Folger LUNA and the Wellcome Collection, amongst other archives, are still very much in flux because there has not been a concrete consensus reached about the best way to approach the many identified problems with archival descriptive metadata. It is clear that archival studies, in conjunction with other disciplines with knowledge about metadata and digital website and database design more broadly, are actively working towards these solutions. While these issues will likely require more than just better informed archivists, there is a clear attention to the need for partnerships between archivists and computer scientists writ large and/or computer science-educated archivists in order to develop these solutions. One such small solution to help address a larger problem of poor archive searchability is the addition of interchangeable terms to the descriptive metadata tags.


  1. American Library Association, “Metadata.” ↩︎
  2. Folger LUNA, “Recipe book” search: https://luna.folger.edu/luna/servlet/view/search?search=SUBMIT&cat=0&q=recipe+book&dateRangeStart=&dateRangeEnd=&sort=call_number%2Cmpsortorder1%2Ccd_title%2Cimprint&QuickSearchA=QuickSearchA ↩︎
  3. Folger LUNA, “Receipt book” search: https://luna.folger.edu/luna/servlet/view/search?showAll=where&q=receipt+book&sort=call_number%2Cmpsortorder1%2Ccd_title%2Cimprint. ↩︎
  4.  All of the specific information about MS.8903 can be found on the Wellcome Collection’s interface at: https://wellcomecollection.org/works/jm3g84uj. ↩︎
  5. Metadata Quality Benchmarks Sub-Group, “Survey of Benchmarks in Metadata Quality: Initial Findings [White Paper],” 25-26. ↩︎
  6. “Text Encoding Initiative.” ↩︎
  7. Text Encoding Initiative, “10 Manuscript Description.” ↩︎
  8. Text Encoding Initiative, “Introducing the Guidelines.” ↩︎
  9. See Jaillant’s discussion of complications associated with privacy, copyright, and technical issues in the development of digital archives for more relevant topics in this broader topic. ↩︎
  10. Jaillant, “How can we make born-digital and digitised archives more accessible?,” 433-434. ↩︎
  11. Chan, “Building Our New Unified Collections Search.” ↩︎
  12. Pim, “Making Wellcome Collection Searchable.” ↩︎

Bibliography

American Library Association. “Metadata.” Accessed November 28, 2023. https://www.ala.org/tools/atoz/metadata/metadata#:~:text=What%20is%20metadata%3F,either%20physical%20or%20electronic%20resources

Carteret, Grace. Manuscript recipe book of Grace Carteret, 1st Countess Granville (1654-1744). Wellcome Collection, London, United Kingdom.

Chan, Alex. “Building Our New Unified Collections Search.” Medium, Stacks: Developments at Wellcome Collection. March 29, 2021. https://stacks.wellcomecollection.org/building-our-new-unified-collections-search-ed399c412b01

Cromwell, L. Cookbook of L. Cromwell [manuscript]. LUNA: Folger Digital Image Collection. Folger Shakespeare Library, Washington, D.C.

Jaillant, Lise. “How can we make born-digital and digitised archives more accessible? Identifying obstacles and solutions.” Archival Science 22 (March 2022): 417-436. ​​https://doi.org/10.1007/s10502-022-09390-7

Metadata Quality Benchmarks Sub-Group. “Survey of Benchmarks in Metadata Quality: Initial Findings [White Paper].” Digital Library Federation (2020). https://bluesyemre.files.wordpress.com/2020/05/survey-of-benchmarks-in-metadata-quality-initial-findings.pdf

Pim, Harrison. “Making Wellcome Collection Searchable.” Medium, Stacks: Developments at Wellcome Collection. April 29, 2021. https://stacks.wellcomecollection.org/making-wellcome-collection-searchable-c1c8c293a542

Text Encoding Initiative. “10 Manuscript Description.” TEI: Guidelines for Electronic Text Encoding and Interchange. Last modified November 16, 2023. https://www.tei-c.org/release/doc/tei-p5-doc/en/html/MS.html

Text Encoding Initiative. “Introducing the Guidelines.” Accessed November 26, 2023. https://tei-c.org/support/learn/introducing-the-guidelines/

Text Encoding Initiative. “Text Encoding Initiative.” Accessed November 26, 2023. https://tei-c.org/