NCSU Libraries have recently collected more than 1.2 million tweets from more than 380,000 Twitter accounts as part of its “New Voices and Fresh Perspectives: Collecting Social Media” initiative.
This type of archiving can be used to supplement traditional data collection methods. With standard practices, a historian might use personal notes, correspondence or intellectual papers to generate a historical perspective of that time’s events. With the focus on social media, the research team hopes to supplement these traditional forms of data with data that may be more relevant in today’s society.
“People aren’t writing formal letters anymore,” said Jason Casden, interim associate head of Digital Library Initiatives.
The initiative started with archiving social media data for the My #HuntLibrary project, which was a way for crowdsourcing people’s interactions. This is why the team initially built its harvesting tool, according to Brian Dietz, digital program librarian for Special Collections.
The library staff developed the harvesting tool, Lentil, for this new archiving initiative. Lentil harvests tweets based on a targeted subject or hashtag, then categorizes them and filters out irrelevant tweets.
The team received a $25,000 North Carolina State Library EZ Innovation grant in order to further its project and create a tool kit so other institutions might begin their own archiving projects.
On a national level, Dietz said, there is a lot of interest in using this tool to document movements like Black Lives Matter and Occupy Wall Street. On a local level, this data can be used to track and document the progress of the university and give students more exposure to NC State’s history beyond the standard methods that are currently being used.
In order for researchers to understand the published works of a movement or individual and the creation of these movements, they need to have access to the primary materials. Dietz and Casden said that social media should be a type of primary evidence because the data serves as a good contextual reference.
With this project comes privacy concerns because, while these materials are posted publicly, they weren’t posted with this purpose in mind. There are several stances on this issue, ranging from “every user must be contacted directly to ensure consent” to “it was posted publicly, this implies consent.”
Josh Gunn, a sophomore studying political science, said he is fine with the latter.
“It seems fair,” Gunn said. “If it’s only being used for archival data, then I think it serves as a relevant deviation from traditional practices.”
Some of the institutions with the biggest need for this kind of technology, like non-profit collectives and public libraries, don’t have the IT infrastructure needed to use the technology in its current state, according to Casden. That is why Lentil has been released as an open source virtual software with the hopes that, through combined efforts, the program might be repackaged in a way that makes it more accessible to smaller institutions.