简体   繁体   中英

Running sentiment analysis for facebook data in json format

I would first like to say that I am very new to coding and know the very basics at best

I was tasked with scraping data from facebook and running a sentiment analysis on it. I got the data using scraping-bot.io and I have it on a json file with the following format

{
    "owner_url": "https://www.facebook.com/########",
    "url": "https://www.facebook.com/post",
    "name": "Page name",
    "date": "date",
    "post_text": "Post title",
    "media_url": "media url attached",
    "likes": ###,
    "shares": ###,
    "num_comments": ###,
    "scrape_time": "date",
    "comments": [
      {
        "author_name": "Name",
        "text": "Comment text",
        "created": "Date"
      },

The posts are in spanish and so I looked up for a library to run the analysis with. I settled on https://pypi.org/project/sentiment-analysis-spanish/ (not sure if it's the best one, so I'm open to suggestions on that front as well)

Ideally I would like to be able to open the json file, run the sentiment analysis on "text" and then save that data into the same or a new file to visualize in another program.

This is what I have so far

from sentiment_analysis_spanish import sentiment_analysis
sentiment = sentiment_analysis.SentimentAnalysisSpanish()
 
f = open('C:/Users/vnare/Documents/WebScraping.json', encoding='utf-8-sig')
 
data = json.load(f)
 
for i in range(len('text')):
    print(sentiment.sentiment(i))

Currently it gives me the following error AttributeError: 'int' object has no attribute 'lower' But I'm sure there's far more that I'm doing wrong there. I appreciate any help provided

AttributeError: 'int' object has no attribute 'lower' means integer cannot be lower-cased. This means that somewhere in your code, you are trying to call the lower() string method on an integer.

If you take a look at the documentation for the sentiment analysis you provided, you will see that print(sentiment.sentiment("something")) will evaluate the sentiment of "something" and give you a score between 1 and 0.

My guess is that when you call sentiment.sentiment("some text") it will use lower() to convert whatever text is passed through to all lowercase. This would be fine if you were passing a string, but you are currently passing an integer!

By using for i in range() , you are indicating that you would like to take a range of numbers from 0 to the end number. This means that your i will always be an integer!

You need to instead loop through your JSON data to access the key/value pairs. "text" cannot be accessed directly as you've done above, but from within the JSON data, it can be! https://www.geeksforgeeks.org/json-with-python/

The important thing to look at is the format of the JSON data that you are trying to access. First, you need to access a dictionary key named "comments". However, what is inside of 'comments'?

[{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]

It's actually another dictionary of key-value pairs inside of a list. Given that list indices start at 0 and there is only one list element (the dictionary) in your example, we need to next use the index 0 to access the dictionary inside. Now, we will look for the key 'text' as you were initially.

When you are learning python, I highly recommend using a lot of print statements when trying to debug. This helps you see what your program sees so you know where the errors are.

import json
from sentiment_analysis_spanish import sentiment_analysis

sentiment = sentiment_analysis.SentimentAnalysisSpanish()

f = open('WebScraping.json', encoding='utf-8-sig')

data = json.load(f)
print(data)

comments = data['comments']
print(comments)

text = comments[0]['text']
print(text)

sentimentScore = sentiment.sentiment(text)
print(sentimentScore)

When you run this, the output will show you what is inside 'data', what is inside 'comments', what is inside 'text', and what the sentiment score is.

{'owner_url': 'https://www.facebook.com/########', 'url': 'https://www.facebook.com/post', 'name': 'Page name', 'date': 'date', 'post_text': 'Post title', 'media_url': 'media url attached', 'likes': 234, 'shares': 500, 'num_comments': 100, 'scrape_time': 'date', 'comments': [{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]}

[{'author_name': 'Name', 'text': 'Comment text', 'created': 'Date'}]
Comment text

0.49789225920557484

This is what helped me see that inside of 'comments' was a dictionary within a list.

Now that you understand how it works, here is a more efficient way to run the code without all the extra prints, You can see I am now implementing the for loop you used earlier. as there may be multiple comments in a real-life scenario.

import json
from sentiment_analysis_spanish import sentiment_analysis

sentiment = sentiment_analysis.SentimentAnalysisSpanish()

f = open('WebScraping.json', encoding='utf-8-sig')

data = json.load(f)
comments = data['comments']
i = 0

for i in range (len(comments)):
     comment = comments[i]['text']
     sentimentScore = sentiment.sentiment(comment)
     print(f"The sentiment score of this comment is {sentimentScore}.")
     print(f"The comment was: '{comment}'.")

This results in the following output.

The sentiment score of this comment is 0.49789225920557484.
The comment was: 'Comment 1'.
The sentiment score of this comment is 0.49789225920557484.
The comment was: 'Comment 2'.

This is the file that I used for reference.

{
    "owner_url": "https://www.facebook.com/########",
    "url": "https://www.facebook.com/post",
    "name": "Page name",
    "date": "date",
    "post_text": "Post title",
    "media_url": "media url attached",
    "likes": 234,
    "shares": 500,
    "num_comments": 100,
    "scrape_time": "date",
    "comments": [
      {
        "author_name": "Name",
        "text": "Comment 1",
        "created": "Date"
      },
      {
        "author_name": "Name",
        "text": "Comment 2",
        "created": "Date"
      }
    ]
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM