You made it. The last tutorial in the WordCloud series. You’ve witnessed web-crawling in Part 1 and Grumpy Cat’s epic fail with the OpenCV fix in Part 2. Now, all that’s left is creating the final WordCloud.
- Part 1 – How to Crawl Your Website and Extract Key Words
- Part 2 – Working with Images using OpenCV – “Binarize”
- Part 3 – Creating the Word Cloud
All the Hard Work is Done
If you haven’t checked out Part 1 and Part 2 of this series, I recommend you take a look before moving ahead. This last function is pretty straight forward. As you can tell from the below code, we call the function created in Part 2 getGrayImage(imageFname) to generate the binary image before creating a np.array with the image data. The stopwords for Python’s wordcloud are found here. The key to generating the wordcloud is WordCloud(background_color=”white”, max_words=20000, mask=img_mask,
stopwords=stopwords). As you can tell from max_words, we can set the total number of words we are able to ingest. Since most of my blog posts don’t contain 20,000 words, we are good to go. The mask variable is where we send the generated image from getGrayImage.