Photo by Plush Design Studio on Unsplash

It is known that topic modeling does not benefit from stemming ref. I propose a workflow to investigate if stemming is appropriate as a method for data reduction.

  1. Take all the tokens and apply the stemming algorithm you would like to test
  2. Construct a list of words that should be equal under stemming
  3. Apply a topic model to your original data
  4. Predict the topic for each word created in …