Part 3 – Creating the WordCloud

You made it. The last tutorial in the WordCloud series. You’ve witnessed web-crawling in Part 1 and Grumpy Cat’s epic fail with the OpenCV fix in Part 2. Now, all that’s left is creating the final WordCloud.

All the Hard Work is Done

If you haven’t checked out Part 1 and Part 2 of this series, I recommend you take a look before moving ahead. This last function is pretty straight forward. As you can tell from the below code, we call the function created in Part 2 getGrayImage(imageFname) to generate the binary image before creating a np.array with the image data. The stopwords for Python’s wordcloud are found here. The key to generating the wordcloud is WordCloud(background_color=”white”, max_words=20000, mask=img_mask,
stopwords=stopwords).
As you can tell from max_words, we can set the total number of words we are able to ingest. Since most of my blog posts don’t contain 20,000 words, we are good to go. The mask variable is where we send the generated image from getGrayImage.

def getWordCloud(textInput,imageFname):
"""this ingests a word csv and image file name to create a wordcloud"""
d = os.getcwd()
# Read the whole text.
text = open(path.join(d,textInput)).read()
# make the mask image
getGrayImage(imageFname)
# taken from
img_mask = np.array(Image.open(path.join(d,'imgs','bw_'+imageFname)))
stopwords = set(STOPWORDS)
stopwords.add("said") #add any extra stopwords
stopwords.add("repositories")
#inputs for wordcloud
wc = WordCloud(background_color="white", max_words=20000, mask=img_mask,
stopwords=stopwords)
# generate word cloud
wc.generate(text)
# store to file
wc.to_file(path.join(d,'imgs','wc_'+imageFname))
# show
plt.imshow(wc)
plt.axis("off")
plt.figure()
plt.imshow(img_mask, cmap=plt.cm.gray)
plt.axis("off")
plt.show()

 

wcCloudFinal

The Final WordCloud — in a cloud 🙂

wcFinal

Leave a Reply

Your email address will not be published. Required fields are marked *