繁体   English   中英

使用 CSV 文件进行 Azure 情感分析

[英]Using CSV file for Azure Sentiment Analysis

我正在尝试使用 Azure 认知服务对报纸文章进行情绪分析( 此处解释)

它适用于单个句子,但是,我正在努力让它适用于包含引号列表的 csv 文件。 我相信我在分配文件时做错了什么,所以带 ** 的部分

def sentiment_analysis_example(client):
**documents = ["I had the best day of my life. I wish you were there with me."]**
response = client.analyze_sentiment(documents=documents)[0]
print("Document Sentiment: {}".format(response.sentiment))
print("Overall scores: positive={0:.2f}; neutral={1:.2f}; negative={2:.2f} \n".format(
    response.confidence_scores.positive,
    response.confidence_scores.neutral,
    response.confidence_scores.negative,
))
for idx, sentence in enumerate(response.sentences):
    print("Sentence: {}".format(sentence.text))
    print("Sentence {} sentiment: {}".format(idx+1, sentence.sentiment))
    print("Sentence score:\nPositive={0:.2f}\nNeutral={1:.2f}\nNegative={2:.2f}\n".format(
        sentence.confidence_scores.positive,
        sentence.confidence_scores.neutral,
        sentence.confidence_scores.negative,
    ))        
sentiment_analysis_example(client)

我想尝试一种更有效的方法,而不是将单个句子复制粘贴到“文档”部分。 我尝试通过创建一个 pandas 数据框

import pandas as pd

df = pd.read_csv('/Users/../Desktop/trial-sun.csv', sep=';')

但是,当我在documents = []中引用 this 时,我收到一条错误消息

"TypeError: Mixing string and dictionary/object document input unsupported."

我的猜测是,我在那里传递的内容需要采用不同的格式,但我不确定如何 go 了解它。

以下代码对我有用:

import pandas as pd
import requests

subscription_key = "<>"
headers = {"Ocp-Apim-Subscription-Key": subscription_key}
endpoint = "https://<>.cognitiveservices.azure.com/"
sentiment_url = endpoint + "/text/analytics/v3.0/sentiment"


def comment_sentiment(comment=None, cid=None):
    """
    Take a single comment in string and analyze the sentiment

    Args:
        comment --  The text content to analyze.
        cid -- The numeric id of the comment analyzed.
    """
    language = "en"
    try:
        document = {"id": cid, "language": language, "text": comment}
        body = {"documents": [document]}
        res = requests.post(sentiment_url,  headers=headers, json=body)
        data = res.json()
        # Extract key phrases
        return data
    except Exception as e:
        print("[Errno {0}] {1}".format(e.errno, e.strerror))


def comment_summary(sentimentResult):
    """
        Take a single response data from comment_sentiment function and summarizes the result

        Args:
            sentimentResult --  The text response data to summarize.
    """

    summary = {"Id": 0, "Sentiment": "",
               "Positive": 0, "Neutral": 0, "Negative": 0}
    for document in sentimentResult['documents']:
        summary["Sentiment"] = document['sentiment'].capitalize()
        summary["Id"] = document['id']
        for each in document['sentences']:
            sentimentscore = each['sentiment']
            if sentimentscore == 'positive':
                summary["Positive"] += 1
            elif sentimentscore == 'negative':
                summary["Negative"] += 1
            else:
                summary["Neutral"] += 1
    return summary


def main(comment_df):
    """
    Take the data frame, get the sentiments and save the result to a CSV file

    Args:
        comment_df -- Data frame containing the text to analyze.
    Returns:
         A data frame consisting of the relevant columns
         'id','sentiment', 'positive','negative','neutral'.
    """
    df2 = comment_df
    # Drop any existing index and use a new one
    df2.reset_index(drop=True, inplace=True)
    print(u"Processing records in data frame....")
    for i, row in df2.iterrows():
        # print(u"Processing Record... #{}".format(i+1))
        text_data = df2.loc[i, "comment"].encode(
            "utf-8").decode("ascii", "ignore")
        sentimentResult = comment_sentiment(text_data, i+1)
        sentimentSummary = comment_summary(sentimentResult)
        # Add result to data frame
        df2.loc[i, "id"] = i+1
        df2.loc[i, "sentiment"] = sentimentSummary['Sentiment']
        df2.loc[i, "positive"] = sentimentSummary['Positive']
        df2.loc[i, "negative"] = sentimentSummary['Negative']
        df2.loc[i, "neutral"] = sentimentSummary['Neutral']
        dfx = df2[['id', 'sentiment', 'positive', 'negative', 'neutral']]
    print(u"Processing completed....")
    # Ensure that numbers are represented as integers and not float
    convert_dict = {'id': int,
                    'positive': int,
                    'negative': int,
                    'neutral': int,
                    'sentiment': str
                    }

    dfx = dfx.astype(convert_dict)
    return dfx


if __name__ == "__main__":
    # read comment data from csv
    commentData = pd.read_csv(
        "https://raw.githubusercontent.com/JimXu199545/data/main/comment.csv", header=0, names=["comment"])
    commentData['nwords'] = commentData.comment.apply(lambda x: len(x.split()))
    commentData['hashed'] = commentData.comment.apply(
        lambda x: hash("".join(x.split())))

    # Remove duplicated record but keep the first occurence of the record
    commentData.drop_duplicates(keep='first', inplace=True)
    # Reindex the data frame to prevent gaps in the indexes
    commentData.reset_index(drop=True, inplace=True)
    df = main(commentData)
    df.to_csv('d:\\result.csv', index=False, header=True)

另外,请参考以下链接以获取更多信息:
参考资料 1 , 参考资料 2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM