Black Metropolis Research Consortium Search Portal

I worked with a project manager and a designer to produce a search portal for the Black Metropolis Research Consortium (BMRC). The portal searches over 1300 guides to archival collections, or finding aids, at 40 different archives. The project offered a few different challenges:

Because the BMRC is a consortium, there are a large number of project stakeholders. We produced several design artifacts to help guide stakeholder discussions.
The file format for archival descriptions is flexible and expressive, but this creates extra challenges when regularizing pages for search and display on the web. As we debugged issues, we made decisions about where we should fix different problems in our processing pipeline—whether that is in the original data, at database load time, or at display time.
Rather than using preexisting software, we implemented the project from scratch using a native XML database as a backend. Because this project generated code to be re-used in other projects, design considerations for the BMRC will be applied to other projects by default.

Examples of archival collections
What is a finding aid?
What is the BMRC?
The project team
Making design decisions
Design principle
Reference interfaces
Wireframes
Implementation notes
Artifact

Examples of archival collections

42 archives; 1382 collections; 4,370 browsable topics, 6,760
people mentioned. — Some stats about this project.

The Shorefront Legacy Center is an archive and cultural center in Evanston, Illinois, just north of Chicago. They are a repository for collections that document the history of African Americans on the North Shore. Their collections include the records of the Evanston chapter of the National Association for the Advancement of Colored People, the papers of Lorraine Morton, who was the first African American Mayor of Evanston, and a near complete run of the Evanston Newsette, a weekly newspaper that covered the African American experience of people living on the North Shore and the lives of former Evanston residents living outside of Illinois. The Shorefront Legacy center makes all of this unique material available for public use, so that researchers, students, or people who are generally interested can use the material and learn from it.

What is a finding aid?

To help users discover, search, and navigate collections like these, archivists create documents called finding aids. These particular finding aids were encoded as an XML document using the Encoded Archival Description (EAD) format. The goal for any standardized format like this is to create more opportunities for code re-use, interoperability, and sharing between institutions and finding aid authors.

What is the BMRC?

The BMRC is a Chicago-based membership association of libraries, universities, museums, and community and arts organizations. Their mission is to connect all who seek to document, share, understand and preserve Black experiences. BMRC member institutions hold many different archival collections, they produce finding aids to describe those collections, and they look for different ways to make those finding aids discoverable. Adding finding aids to a search interface like the one we developed for the BMRC is one way to do that. Because the search includes materials from many Chicago-area institutions, people have chances to discover collections at places they hadn’t thought to look.

The project team

A reseracher searches and browses the BMRC Search
Portal—that website returns web pages to the researcher, and enables the
discovery of archival collections at many institutions. The Search Portal
itself is a frontend to a Wagtail/Django-based server, which communicates with
a MarkLogic XML database. Physical archival collections are each represented by
individual finding aids—each of those documents is edited by consortium staff
who deliver those documents to a project archivist and developer who regularize
them and load them into the XML database. — The search portal, users, institutions, archival collections, finding aids, and staff. I was the technical lead for the project, while the project archivist interacted with staff from member institutions and other project stakeholders.

To implement the portal I worked closely with the project archivist, who served a dual role. As an archivist, she was in charge of managing finding aid submissions to the site. When we worked together to debug search and display issues, she used her archival skills and I looked at problems from the perspective of our tech stack.

She also served as the project manager for the portal, facilitating discussions about design decisions with consortium members.

Finally, the original designer for the BMRC website produced a set of front end templates we could use to implement the new search portal. She helped ensure that the portal used the same visual language as the rest of the BMRC site.

Making design decisions

We used several design methods during this project, including a design principle we could continue to refer to during the course of our work, a set of representative interfaces to borrow ideas from, and interface wireframes. These served several different functions. They helped me as a developer understand how to implement the portal. They helped the project archivist communicate to project stakeholders about the site, to elicit their feedback, and to make sure the stakeholders and I were in alignment about the direction of the project. Because the project archivist was communicating with up to 40 individual institutions during project meetings, having tools to guide those discussions was important.

Design principle

On this project we were also able to be explicit about a design principle that helped guide our work. In this case, our principle was that the interface should be usable by both expert and non-expert researchers. Although BMRC member collections are used by many different people, including grade school or high school students, professional researchers and community members, it was important to the BMRC that we optimized the site for use by people without any specific training in how to use archival collections. This meant that in many cases we were evaluating the site for things like jargon and other language that could be offputting to people who were just generally curious about the material.

Principles like these help me interpret things like wireframes and mockups, and they help me look at a sample site more critically. In the case of our label example, we found places where labels used archivist jargon. An expert researcher might have been able to understand the label because they have taken the time to learn that jargon—but we didn’t want to force non-experts to require that kind of knowledge. Having explicit design principles let us say things like “we like this interface, but it’s optimized for advanced researchers. We’re going to borrow some elements, but in this specific case we’ll make a change.”

Reference interfaces

The project archivist and I found a set of about ten good interfaces for similar projects online that we could borrow ideas from. This became a collection of sites that we could return to over the course of the project to help with different decisions: in many cases we would find ourselves with a very granular design decision, like what label to use for something. In cases like this we could look at our sample interfaces and see what label others have used. This made it quicker to make those small decisions, but it also had a few other effects. First, it nudged our decisions towards interfaces that other people understood—because, as Jakob Nielsen says, users spend most of their time on other people’s websites. And second, when talking with stakeholders about different decisions, it provided some structure for the conversation—the project manager could say things like “most similar sites use X as a label, which is why we chose to use X too.”

Wireframes

The project manager produced a set of wireframes and that she would use those to guide any discussion that might involve the search interface. This was to deal with two potential problems—first, I wanted to be sure that stakeholders were aligned on design decisions, and that discussions about the interface didn’t accidentally make mutually exclusive choices. Second, I wanted to be sure that were wouldn’t end up “designing in code” for the project and implementing the site in a freeform way, without an overall plan.

The wireframes ended up getting quite complicated so they they could describe the behavior of facets in the faceted browse. We needed to make decisions about how the different facets would be joined together, as set intersections or set unions, and we needed to iron out how facet behavior should work. There needed to be a standard way to add facets, to see what facets are currently active, and to remove them. Because some facets could contain hundreds of values, there also needed to be a way to “view all” facets for a document, in a way that would probably take up more space than what the most relevant facets for a particular search would be.

Implementation notes

A researcher interacts with a faceted search interface. That
interface is the frontend to a Django/Wagtail-based middle layer, which
interacts with an XML database, that itself provides access to a set of XML
documents managed by backend staff. — Backend diagram.

I implemented this project using the XML database features of MarkLogic. The existing BMRC website was implemented in the Wagtail content management system, which is itself build on top of Django and written in Python. The search portal includes several new views, including a search portal homepage, curated topics to introduce people to some of the materials in the portal, and the search interface itself.

I wrote several Django management commands to help manage the site, especially for validating and loading data. A regularization command ensures that all finding aids use the EAD 2002 namespace to simplify further processing, and it regularizes some EAD elements by doing things formatting lists and headings in specific ways.

Other commands clear finding aids from the MarkLogic database, and output the results of browses to the terminal to help with debugging.

When a site users searches the site, their search query, along with any facets that have been selected, is processed and embedded into XQuery. The Python code submits that XQuery to the MarkLogic backend, where it is processed by the MarkLogic REST API. Setting the system up this way means that setup on MarkLogic is minimal, so we really only have to worry about maintaining one server that has been customized. Here our MarkLogic server uses a very minimal, stock configuration.

MarkLogic responds to search queries by delivering data to the front end that can be formatted to produce a search result page. It responds to requests for specific finding aids by returning an HTML transform of a finding aid itself.

Most of the work of this project goes into managing finding aids—figuring out whether or not a particular problem is an issue with a finding aid itself or with the interface, debugging those inconsistencies, working with the organizations who maintain the finding aids. In other cases finding aids reveal places in the code that can be improved—figuring out when to treat a problem as a data problem or a coding problem is an ongoing concern here.

In general, this project fits a pattern from a lot of library web development projects—implementing the site itself is a relatively small part of the project compared to managing the data for the project.

Artifacts

A thumbnail
image showing the search portal web interface.
Visit the Black Metropolis Research Consortium Search Portal (external link)

John Jung