Twitter Auto-Scheduler Reloaded

I have found a process improvement for my How to Schedule Your Twitter Account Posts code. I thought briefly about it when I was originally coding but wanted to push my work to production first. What I found lacking was that every-time I publish a new post to my blog I have to turn off my code, update the database and then reload. I wish there was a way I could check for a new blog post, update my database, and post to Twitter auto-magically. Oh wait, there is!

Deploy Spider!

The plan is to webcrawl my website’s RSS feed every day and check to see if there is anything new. I’ve already written one blog post on Part 1 – How to Crawl Your Website and Extract Key Words. Taking that post and merging it with my How to Schedule Your Twitter Account Posts should work. We also have the Schedule package that will make it easy to run a new thread for deploying the spider. See, this is where Automating My Life gets fun.

Most programmers don’t re-write code, especially the code that took them forever to write the first time. I will only talk about the new functions in this blog post, and most likely the funnest parts.

 The Shiny New Code

updateSchedDB() is the new function that runs every night. This function runs deploySpider() which pings my RSS feed then does a Pandas “merge” to weed out the old links versus the new links. We also have the reschedule() function to randomly select a new posting time that does not currently exist. You should probably put this into a while loop so that it consistently re-schedules if there ever is a match, but there are so many possible combinations that I’ll leave it alone for now. My crowning achievement is the pollHashTags() function.

def updateSchedDB(uid):
"""update the scheduleDB - uid is not needed just placholder"""
deploySpider() #link, descr, imagelink
df = pd.read_csv('spiderRez.csv')
df = df.drop('descr',1) #encoding sucks 
oldDB = pd.read_csv(twitSchedDB)
newDF = oldDB.merge(df,how='outer',on=['Links'],indicator=True)
print newDF
for index,row in newDF.iterrows():
newTime = reschedule() #get a new time
#make sure the time is not duplicated
if newTime not in newDF['whatInterval']:
if row['EndText'] == "NeedHashTag":
newDF['EndText'].loc[newDF['Links']==row['Links']] = pollHashtags()
if row['whatInterval'] == "whatTime":
newDF['whatInterval'].loc[newDF['Links']==row['Links']] = newTime
#get rid of the previous index to reset indexes
newDF = newDF.drop('index',1)
newDF = newDF.drop('_merge',1) #we dont need this anymore


Hashtagging Randomly

Have you ever wanted to know which #hashtags are the best to use on Twitter? There are a few things you could do but some of them are out of the scope of this post, so I settled for following Top Twitter Leaders and extracting an index of all their “#” signs using buildHashtagIndex().

def buildHashtagIndex(topLeaders):
"""gets a hashtag from a list of my viewers interests or top twitter leaders"""
page_list = []
for page in Cursor(api.user_timeline,screen_name=topLeaders, count=50,_rts=True).pages(10):
for page in page_list:
for status in page:
txt = status.text
if "#" in txt:
getHash = txt.split()
allHashTag = [x for x in getHash if "#" in x]
newHashTag = [tags.append(x) for x in allHashTag if x not in tags] #get only uniques
if len(tags) > 100:

Sprinkle In Some Data Science

Not only do I get the new links, I also get 3 random hashtags from an index of 100 possible hashtags. Right now they are completely random, but I could also do some metrics on the topLeaders and their Twitter posts. We could do term frequencies for every hashtag for the last 10 pages and rank them largest to smallest, or even extract features from their text that “describe why” they may have chosen that hashtag. Using those features, we could build a model to “recommend to ourselves” a better hashtag that may reach more people.

I love sprinkles…


Leave a Reply

Your email address will not be published. Required fields are marked *