Today, human knowledge dissemination has rapidly expanded from the constraints of physical printing to a deluge of digital objects, ranging from digitized historical collections to web-created materials, and from single-media content to diverse multimedia resources. The resulting data explosion has called for advanced natural language processing (NLP) techniques to support intelligent information processing. Despite NLP’s speedy development, data heterogeneity and technology limitations in adaptations still challenge the reliability of NLP in support of real-world digital knowledge dissemination.
This talk will discuss the aforementioned issues with a specific focus on data-centric challenges in three stages of human knowledge dissemination, which includes: (1) digitization noise in exploring cultural heritage knowledge; (2) inconsistent data annotations in organizing scientific knowledge; and (3) disparity on multi-modal data in disseminating multimedia knowledge. Moving forward from existing progress, I will further envision future directions and share research plans such as promoting reliable semantic understanding on digitized text, hoping to facilitate a more trustworthy and inclusive NLP support for a knowledge society.