Explaining LDA using a dog pedigree model
Machine learning algorithms are now so accessible that even my non-technical wife always asks questions like: “Isn’t that what ChatGPT is capable of?”
It’s time for data scientists to remain vigilant about the why and how behind machine learning algorithms.
This two-part blog post explains how Latent Dirichlet Allocation (a staple in every data scientist’s arsenal doing things like LDA, topic modeling, recommendations, etc.) works with the help of dog bloodlines. This is the actual journey I tried to explain to my wife. model. By the end of this series, you should be able to answer the following questions:
Part 1:
- How does LDA work?
- How do I explain LDA to non-technical people?
part 2:
- How does LDA converge?
- When to use LDA and when not to use it?
- What are the alternatives and variants (other than LLM) of LDA?
let’s start.
Imagine you have the best job in the world.
Estimate pedigree combinations from lots of adorable dog photos
Easy enough!
Short legs = Corgi or Dachshund.
Long body = dachshund.
Chocolate chip muffin face = Chihuahua.
However, each dog has a unique combination of characteristics. A dog may have short legs like a Corgi, but it may also have the face of a Chihuahua. We don’t just identify varieties; Model a mosaic of traits into groups of breeds.
Number of topics and corpora
Although we don’t categorize dog photos by breed, it’s helpful to consider the physical characteristics that can be observed in all images and what their general characteristics are.