In the previous program, I had read the tweets into a list of tuples. The first step is to change this into a list of lists. This can be done simply with the following generator:
tweets = [list(t) for t in taggedtweets]
Now that we have a list of tweets, we can simply iterate through each tweet and add the result of classifying each tweet.
for t in tweets:
t.append(classifier.classify(feature_extractor(t[0])))
Now the list has the tweet, the training classification and the scoring classification. We can then simply calculate how many of them are the same and divide by the number of tweets. Since I like to view the results in the shell, I added a nice print statement that will format the accuracy as a percent.
accuracy = sum(t[1]==t[2] for t in tweets)/float(len(tweets))
print "Classifier accuracy is {:.0%}".format(accuracy)
Using the training dataset I provided, this comes out with a accuracy of 46%. Not exactly the most accurate dataset. Ideally I would compare this against another dataset, but I don't have one available at the moment. Looking at the results can bring up another question as to whether this is the best metric to use. The "classify" method chooses the classification with the highest probability. Is it worth taking into account the probability distribution calculated by the classifier? For example, a tweet could have this probability distribution:
P(sentiment==negative)=0.620941
P(sentiment==neutral)=0.379059
In this example, the tweet is actually classified as neutral. Using the above metric, it gets marked as being inaccurate. However, should the fact that the model's probability of being neutral count in some way? I think it should get some partial credit. Instead of counting it as not matching (score of 0), I use the probability of what the tweet was categorized (score of 0.379059). Although this increases the credit for non-matches, this decreases the credit given for matches. This calculation changes the code slightly:
tweets = [list(t) for t in taggedtweets]
for t in tweets:
t.append(classifier.classify(feature_extractor(t[0])))
pc = classifier.prob_classify(feature_extractor(t[0]))
t.append(pc.prob(t[1]))
accuracy = sum(t[1]==t[2] for t in tweets)/float(len(tweets))
weighted_accuracy = sum(t[3] for t in tweets)/float(len(tweets))
print "Classifier accuracy is {:.0%}".format(accuracy)
print "Classifier weighted accuracy is {:.0%}".format(weighted_accuracy)
The results aren't quite what I expected. It comes out to roughly 46% like the simple accuracy. I think this weighted accuracy will be a better measurement. Using this metric, we can now work on improving the model and determining the size of a training set required to feed into the model.
No comments:
Post a Comment