簡體   English   中英

我正在嘗試解析網站並生成正面、中性或負面情緒分析

[英]I am trying to parse a website and generate positive, neutral, or negative sentiment analysis

我試圖從 CNBC 網站獲得一個非常基本的情緒分析。 我把這段代碼放在一起,它工作得很好。

from bs4 import BeautifulSoup
import urllib.request
from  pandas import DataFrame

resp = urllib.request.urlopen("https://www.cnbc.com/finance/")
soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'))
    
substring = 'https://www.cnbc.com/'

df = ['review']
for link in soup.find_all('a', href=True):
    print(link['href'])
    if (link['href'].find(substring) == 0): 
        # append
        df.append(link['href'])

#print(link['href'])


#list(df)
# convert list to data frame
df = DataFrame(df)
#type(df)
#list(df)

# add column name
df.columns = ['review']

# clean up
df['review'] = df['review'].str.replace('\d+', '')

# Get rid of special characters
df['review'] = df['review'].str.replace(r'[^\w\s]+', '')


from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
df['sentiment'] = df['review'].apply(lambda x: sid.polarity_scores(x))
def convert(x):
    if x < 0:
        return "negative"
    elif x > .2:
        return "positive"
    else:
        return "neutral"
df['result'] = df['sentiment'].apply(lambda x:convert(x['compound']))
df['result']

當我運行上面的代碼時,我得到了肯定和否定,但這些沒有映射到原始的“評論”。 如何在每個鏈接的語言旁邊的數據框中顯示每種情緒? 謝謝!

哦,伙計,我完全失去了它! 這只是一個簡單的合並!!

df_final = pd.merge(df['review'], df['result'], left_index=True, right_index=True)
df_final

結果:

0                                              review  neutral
1                      https://www.cnbc.com/business/  neutral
2   https://www.cnbc.com/2020/09/15/stocks-making-...  neutral
3   https://www.cnbc.com/2020/09/15/stocks-making-...  neutral
4             https://www.cnbc.com/maggie-fitzgerald/  neutral
..                                                ...      ...
90                      https://www.cnbc.com/finance/  neutral
91  https://www.cnbc.com/2020/09/10/citi-ceo-micha...  neutral
92                https://www.cnbc.com/central-banks/  neutral
93  https://www.cnbc.com/2020/09/10/watch-ecb-pres...  neutral
94               https://www.cnbc.com/finance/?page=2  neutral

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM