JSTOR @ DHUG 2018
Join Ron Snyder, Director of Research at JSTOR Labs, and Sharon Garewal, Senior Metadata Librarian, Taxonomy Manager as they present on a panel called Building an LDA topic model using Wikipedia.
They’ll discuss how they went about creating training data for use in JSTOR’s new Text Analyzer, a tool that allows users to upload a document, have it automatically analyzed, and find relevant content on JSTOR. Using the JSTOR Thesaurus hierarchy of 48,000 terms the team identified and reviewed Wikipedia articles to be used as training data for a topic model using a custom curation tool. The result was a topic model including the most significant terms from the JSTOR Thesaurus (approx. 18,000) trained using curated Wikipedia articles. In this presentation, Sharon and Ron will discuss the process used, share initial findings and areas for future work (including multilingual topic inferencing), and provide a short demo of the curation tool and Text Analyzer app.
Location: Hotel Andaluz, Barcelona Ballroom on the Mezzanine Level