Photo by Plush Design Studio on Unsplash
It is known that topic modeling does not benefit from stemming ref. I propose a workflow to investigate if stemming is appropriate as a method for data reduction.
- Take all the tokens and apply the stemming algorithm you would like to test
- Construct a list of words that should be equal under stemming
- Apply a topic model to your original data
- Predict the topic for each word created in …