Topic modeling improvement

donnY954 report abuse

I need to create a topic model. I have used the LDA algorithm, and it works. But there is an issue: topics are not of the form I want them to be. For example, I need items that represent various concerns in the world's popular media (like coronavirus pandemic, trade war, Brexit, etc.). But instead of such topics, my model predicts mostly topics like politics, economy, sport, culture, etc. I am not interested in such topics, but I understand that they are actually good topics (just with a different focus from what I need). Is there a way to achieve the results I am interested in?

Answers

spyKY report abuse

Latent Dirichlet Allocation (LDA) algorithm is one of the most popular methods of doing topic modelling. But there are also other methods, for example, LSA (Latent Semantic Analysis) and NMF (Non-negative Matrix Factorization). Have you tried them? Probably they will be able to produce better results in your case.

donnY954 report abuse

No, I didn't try them. I thought that LDA is the best method.

Ifful report abuse

I don't agree with @spyKY It is unlikely that those algorithms will do what @donnY954 wants. Instead, I recommend researching a GuidedLDA model. This is the modification of the standard LDA. It allows to seed some words and form topics around these words. So, in some extend, this is the semi-supervised algorithm. But you still don't need to have the labelled dataset.

donnY954 report abuse

Probably GuidedLDA is what I need. But I am not a professional NLP specialist, and I have no skills to implement this approach on my own. Is there a library for GuidedLDA?

Ifful report abuse

I know about one open-sourced Python package - https://github.com/vi3k6i5/GuidedLDA . Maybe there are some other implementations. Try to find them, if you need.

donnY954 report abuse

Thanks a lot! It seems that GuidedLDA is exactly what I need!

donnY954 report abuse

Thanks a lot! It seems that GuidedLDA is exactly what I need!

Add Answer

Need support?

Just drop us an email to ... Show more