Structured Topic Modeling: Leveraging Sparsity and Graphs for Improved Inference
Abstract: Classical topic modeling approaches, such as Latent Dirichlet Allocation (LDA) and probabilistic Latent Semantic Indexing (pLSI), decompose a document-term matrix into a mixture of topics, offering a powerful tool for uncovering latent thematic structures from document corpora or compositional data at large. However, these methods generally assume document independence, overlooking potential relationships or additional structural information that could improve inference—especially in contexts with short documents or large vocabulary sizes. In this talk, we will consider two new structured approaches…