![](/img/trans.png)
[英]Output vader sentiment scores in columns based on dataframe rows of tweets
[英]Give scores to dataframe based on id
我有一个按日期索引的数据框,我试图根据类别为每个帐户标识提供分数,如果该类别值存在于索引日期,则此数据框将如下所示。
accountid category Smooth Hard Sharp Narrow
timestamp
2018-03-29 101 Smooth 1 NaN NaN NaN
2018-03-29 102 Hard NaN 1 NaN NaN
2018-03-30 103 Narrow NaN NaN NaN 1
2018-04-30 104 Sharp NaN NaN 1 NaN
2018-04-21 105 Narrow NaN NaN NaN 1
什么是在每个帐户ID遍历数据帧并为未堆叠的每个类别分配分数的最佳方法。
这是数据帧创建脚本。
import pandas as pd
import datetime
idx = pd.date_range('02-28-2018', '04-29-2018')
df = pd.DataFrame(
[[ '101', '2018-03-29', 'Smooth','NaN','NaN','NaN','NaN'], [
'102', '2018-03-29', 'Hard','NaN','NaN','NaN','NaN'
], [ '103', '2018-03-30', 'Narrow','NaN','NaN','NaN','NaN'], [
'104', '2018-04-30', 'Sharp','NaN','NaN','NaN','NaN'
], [ '105', '2018-04-21', 'Narrow','NaN','NaN','NaN','NaN']],
columns=[ 'accountid', 'timestamp', 'category','Smooth','Hard','Sharp','Narrow'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df=df.set_index(['timestamp'])
print(df)
您可以将str访问器与get_dummies
一起get_dummies
:
df[['accountid','category']].assign(**df['category'].str.get_dummies())
输出:
accountid category Hard Narrow Sharp Smooth
timestamp
2018-03-29 101 Smooth 0 0 0 1
2018-03-29 102 Hard 1 0 0 0
2018-03-30 103 Narrow 0 1 0 0
2018-04-30 104 Sharp 0 0 1 0
2018-04-21 105 Narrow 0 1 0 0
并用nan替换0
df[['accountid','category']].assign(**df['category'].str.get_dummies())\
.replace(0,np.nan)
输出:
accountid category Hard Narrow Sharp Smooth
timestamp
2018-03-29 101 Smooth NaN NaN NaN 1.0
2018-03-29 102 Hard 1.0 NaN NaN NaN
2018-03-30 103 Narrow NaN 1.0 NaN NaN
2018-04-30 104 Sharp NaN NaN 1.0 NaN
2018-04-21 105 Narrow NaN 1.0 NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.