Identifying Related and Same-Work Relationships in Large Digital Libraries

July 30, 2020 4:00 PM - 5:00 PM (EDT)

Description

The rapid growth of scanned-work digital libraries presents a new opportunity for learning more about our collections. With digital access to text inside the books of a collection, content-based text mining methods can be leveraged to learn more about the relationships between works, helping correct inaccurate metadata, suggest classification information, recommend similar works, and label the nature of links between works.

This talk will introduce the Similarities and Duplication in Digital Libraries project, SADDL, a project identifying same-work relationships among the 17 million works seen in the HathiTrust Digital Library. SaDDL is identifying exact duplicates as well as traditionally difficult-to-identify relationships such as derivatives, different editions, abridgments, and whole or part relationships. We present the challenges of the problem, our project's approach to meeting them, and a new dataset for cataloguers and scholars to apply our outcomes.

Location

United States

Contact Information

Association for Information Science and Technology | ASIS&T
Name: Cathy Nash
Phone: 3014950900
Email: webinars@asist.org

Please select your registration below. If you are an ASIS&T Member, your registration discount will appear once you select the registration in the "Attendee" box at Step 3.

We're sorry. No registrations are currently available. You may wish to contact the event organizer for assistance.