I have found a process improvement for my How to Schedule Your Twitter Account Posts code. I thought briefly about it when I was originally coding but wanted to push my work to production first. What I found lacking was that every-time I publish a new post to my blog I have to turn off my code, update the database and then reload. I wish there was a way I could check for a new blog post, update my database, and post to Twitter auto-magically. Oh wait, there is!
The plan is to webcrawl my website’s RSS feed every day and check to see if there is anything new. I’ve already written one blog post on Part 1 – How to Crawl Your Website and Extract Key Words. Taking that post and merging it with my How to Schedule Your Twitter Account Posts should work. We also have the Schedule package that will make it easy to run a new thread for deploying the spider. See, this is where Automating My Life gets fun.
Most programmers don’t re-write code, especially the code that took them forever to write the first time. I will only talk about the new functions in this blog post, and most likely the funnest parts.
The Shiny New Code
updateSchedDB() is the new function that runs every night. This function runs deploySpider() which pings my RSS feed then does a Pandas “merge” to weed out the old links versus the new links. We also have the reschedule() function to randomly select a new posting time that does not currently exist. You should probably put this into a while loop so that it consistently re-schedules if there ever is a match, but there are so many possible combinations that I’ll leave it alone for now. My crowning achievement is the pollHashTags() function.
Have you ever wanted to know which #hashtags are the best to use on Twitter? There are a few things you could do but some of them are out of the scope of this post, so I settled for following Top Twitter Leaders and extracting an index of all their “#” signs using buildHashtagIndex().
Sprinkle In Some Data Science
Not only do I get the new links, I also get 3 random hashtags from an index of 100 possible hashtags. Right now they are completely random, but I could also do some metrics on the topLeaders and their Twitter posts. We could do term frequencies for every hashtag for the last 10 pages and rank them largest to smallest, or even extract features from their text that “describe why” they may have chosen that hashtag. Using those features, we could build a model to “recommend to ourselves” a better hashtag that may reach more people.
I love sprinkles…