TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora
Published at
NAACL
| Mexico City, Mexico
2021
Abstract
Embeddings of words and concepts capture syntactic and semantic regularities of
language; however, they have seen limited use as tools to study characteristics
of different corpora and how they relate to one another. We introduce
TextEssence, an interactive system designed to enable comparative analysis of
corpora using embeddings. TextEssence includes visual, neighbor-based, and
similarity-based modes of embedding analysis in a lightweight, web-based
interface. We further propose a new measure of embedding confidence based on
nearest neighborhood overlap, to assist in identifying high-quality embeddings
for corpus analysis. A case study on COVID-19 scientific literature illustrates
the utility of the system. TextEssence is available from
this https URL.