[英]How to make dictionary keys is one column of Pandas dataframe to the columns?
I have a dataframe with one column containing stringified list containing dictionaries.我有一个数据框,其中一列包含包含字典的字符串化列表。 I was wondering how can I make new columns from these dictionary keys.我想知道如何从这些字典键中创建新列。
I am looking solution using pandas methods like apply stack etc
and NOT USING FOR LOOP as far as possible.我正在寻找使用 Pandas 方法(如apply stack etc
解决方案,并且尽可能不使用 FOR LOOP。
Here is the problem:这是问题所在:
speakers = ['Einstein','Newton']
views = [1000,2000]
ratings0 = ("[{'id': 7, 'name': 'Funny', 'count': 100}, {'id': 1, 'name': 'Sad', "
"'count': 110}, {'id': 9, 'name': 'Happy', 'count': 120}]")
ratings1 = ("[{'id': 7, 'name': 'Happy', 'count': 200}, {'id': 3, 'name': 'Funny', "
"'count': 210}, {'id': 2, 'name': 'Sad', 'count': 220}]")
ratings = [ratings0, ratings1]
df = pd.DataFrame({'speaker': speakers, 'ratings': ratings,'views':views})
print(df)
speaker ratings views
0 Einstein [{'id': 7, 'name': 'Funny', 'count': 100}, {'i... 1000
1 Newton [{'id': 7, 'name': 'Happy', 'count': 200}, {'i... 2000
My attempt so far,我到目前为止的尝试,
# new dataframe only for ratings
dfr = df['ratings'].apply(ast.literal_eval)
dfr = dfr.apply(pd.DataFrame)
dfr = dfr.apply(lambda x: x.sort_values(by='name'))
dfr = dfr.apply(pd.DataFrame.stack)
print(dfr)
0 1 2
count id name count id name count id name
0 100 7 Funny 110 1 Sad 120 9 Happy
1 200 7 Happy 210 3 Funny 220 2 Sad
This gives multi-index dataframe.这给出了多索引数据帧。 I tried sorting the dictionary, but still it is not sorted and the column name
does not have the same values.我尝试对字典进行排序,但仍然没有排序并且列name
没有相同的值。 Also, I am unsure how to move the values of column name
to replace column count
and remove other unwanted columns.另外,我不确定如何移动列name
的值来替换列count
并删除其他不需要的列。
speaker views Funny Sad Happy
Einstein 1000 100 110 120
Newton 2000 210 220 200
I am using Pandas 0.20 and the method .explode()
is absent in my workplace and I am not permitted to update Pandas.我正在使用 Pandas 0.20 并且我的工作场所中没有.explode()
方法,我不允许更新 Pandas。
For pandas >= 0.25.0
you can use ast.literal_eval
+ explode
+ pivot
对于pandas >= 0.25.0
你可以使用ast.literal_eval
+ explode
+ pivot
ii = df.set_index('speaker')['ratings'].apply(ast.literal_eval).explode()
u = pd.DataFrame(ii.tolist(), index=ii.index).reset_index()
u.pivot('speaker', 'name', 'count')
name Funny Happy Sad
speaker
Einstein 100 120 110
Newton 210 200 220
For older versions of pandas
对于旧版本的pandas
a = df['speaker']
b = df['ratings']
ii = [
{**{'speaker': name}, **row}
for name, element in zip(a, b) for row in ast.literal_eval(element)
]
pd.DataFrame(ii).pivot('speaker', 'name', 'count')
You may use sum
, index.repeat
to construct a new dataframe and join it df[['speaker', 'views']]
and assign it to df1
.您可以使用sum
、 index.repeat
来构造一个新的数据index.repeat
并将其加入df[['speaker', 'views']]
并将其分配给df1
。 Next, set_index
, unstack
, and reset_index
接下来, set_index
, unstack
,和reset_index
df['ratings'] = df['ratings'].apply(ast.literal_eval)
df1 = (pd.DataFrame(df.ratings.sum(), index=df.index.repeat(df.ratings.str.len()))
.drop('id', 1).join(df[['speaker', 'views']]))
df1.set_index(['speaker', 'views', 'name'])['count'].unstack().reset_index()
Out[213]:
name speaker views Funny Happy Sad
0 Einstein 1000 100 120 110
1 Newton 2000 210 200 220
Note : name
in the final output is the label of the columns axis.注意:最终输出中的name
是列轴的标签。 If you don't want to see it, just chain additional rename_axis
as follows如果你不想看到它,只需按如下方式链接额外的rename_axis
df1.set_index(['speaker', 'views', 'name'])['count'].unstack().reset_index() \
.rename_axis([None], axis=1)
Out[214]:
speaker views Funny Happy Sad
0 Einstein 1000 100 120 110
1 Newton 2000 210 200 220
For loops are not always bad. For 循环并不总是坏的。 You can give it a try:你可以试一试:
dfr = pd.DataFrame(columns=['id','name','count'])
for i in range(len(df)):
x = pd.DataFrame(df['ratings'].apply(ast.literal_eval)[i])
x.index = [i]*len(x)
dfr = dfr.append(x)
dfr = dfr.reset_index()
dfr = (dfr.drop('id',axis=1)
.pivot_table(index=['index'], columns='name',
values='count',aggfunc='sum')
.rename_axis(None, axis=1).reset_index())
df_final = df.join(dfr)
df_final.drop(['index','ratings'],axis=1,inplace=True)
df_final
Gives:给出:
speaker views Funny Happy Sad
0 Einstein 1000 100 120 110
1 Newton 2000 210 200 220
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.