Identifying Related and Same-Work Relationships in Large Digital Libraries

July 30, 2020 4:00 PM - 5:00 PM (EDT)


The rapid growth of scanned-work digital libraries presents a new opportunity for learning more about our collections. With digital access to text inside the books of a collection, content-based text mining methods can be leveraged to learn more about the relationships between works, helping correct inaccurate metadata, suggest classification information, recommend similar works, and label the nature of links between works.

This talk will introduce the Similarities and Duplication in Digital Libraries project, SADDL, a project identifying same-work relationships among the 17 million works seen in the HathiTrust Digital Library. SaDDL is identifying exact duplicates as well as traditionally difficult-to-identify relationships such as derivatives, different editions, abridgments, and whole or part relationships. We present the challenges of the problem, our project's approach to meeting them, and a new dataset for cataloguers and scholars to apply our outcomes.


United States

Contact Information

Association for Information Science and Technology | ASIS&T
Name: Cathy Nash
Phone: 3014950900

Please select your registration below. If you are an ASIS&T Member, your registration discount will appear once you select the registration in the "Attendee" box at Step 3.

We're sorry. No registrations are currently available. You may wish to contact the event organizer for assistance.