简体   繁体   中英

Is the average of individual sentiment analysis of 5000 comments the same as sentiment analysis of concatenation of 5000 comments?

I'm trying to do a sentiment analysis on a reddit thread. The issue I'm facing is that some of the free tiers of cloud NLP APIs (Google Natural Language, Azure Text Analytics etc.) only allow 5000 calls per month in the free tier. I'm trying to see if I can concatenate some of the comments up to the max limit of characters per call to get more of the comments analyzed in the free tier.

  • Is this a flawed approach?
  • Will doing a sentiment analysis on a concatenated string of comments lead to wrong sentiment score?
  • Should I be doing sentiment analysis on individual comments and then average all the individual scores to get the overall thread score?

Interesting question - IF the comments were independent and not related at all THEN concatenation or average would both probably lead you to a neutral score - similar to the outcome of a series of coin tosses is 0.5 and not 1 or 0. This would not be very useful .

However, assuming you are doing sentiment analysis of a reddit thread around one post (and not analyses of threads of multiple posts within a subreddit), you will likely get the same result with concatenation or average . Comments in a reddit thread are generally related and either positive or negative (or completely unrelated). So you should pick up the sentiment with your proposed concatenation approach in your use case.

My theory (not backed by data yet) is that using the average or concatenation will tend to cluster your sentiments around neutral and you will not see strong positives or negatives.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM