简体   繁体   English

如何使字典键是 Pandas 数据框的一列到列?

[英]How to make dictionary keys is one column of Pandas dataframe to the columns?

I have a dataframe with one column containing stringified list containing dictionaries.我有一个数据框,其中一列包含包含字典的字符串化列表。 I was wondering how can I make new columns from these dictionary keys.我想知道如何从这些字典键中创建新列。

I am looking solution using pandas methods like apply stack etc and NOT USING FOR LOOP as far as possible.我正在寻找使用 Pandas 方法(如apply stack etc解决方案,并且尽可能不使用 FOR LOOP。

Here is the problem:这是问题所在:

speakers = ['Einstein','Newton']
views = [1000,2000]
ratings0 = ("[{'id': 7, 'name': 'Funny', 'count': 100}, {'id': 1, 'name': 'Sad', "
 "'count': 110}, {'id': 9, 'name': 'Happy', 'count': 120}]")

ratings1 = ("[{'id': 7, 'name': 'Happy', 'count': 200}, {'id': 3, 'name': 'Funny', "
 "'count': 210}, {'id': 2, 'name': 'Sad', 'count': 220}]")


ratings = [ratings0, ratings1]
df = pd.DataFrame({'speaker': speakers, 'ratings': ratings,'views':views})

print(df)
speaker                                            ratings  views
0  Einstein  [{'id': 7, 'name': 'Funny', 'count': 100}, {'i...   1000
1    Newton  [{'id': 7, 'name': 'Happy', 'count': 200}, {'i...   2000

My attempt so far,我到目前为止的尝试,

# new dataframe only for ratings
dfr = df['ratings'].apply(ast.literal_eval)
dfr = dfr.apply(pd.DataFrame)
dfr = dfr.apply(lambda x: x.sort_values(by='name'))
dfr = dfr.apply(pd.DataFrame.stack)

print(dfr)

 0               1               2          
  count id   name count id   name count id   name
0   100  7  Funny   110  1    Sad   120  9  Happy
1   200  7  Happy   210  3  Funny   220  2    Sad

This gives multi-index dataframe.这给出了多索引数据帧。 I tried sorting the dictionary, but still it is not sorted and the column name does not have the same values.我尝试对字典进行排序,但仍然没有排序并且列name没有相同的值。 Also, I am unsure how to move the values of column name to replace column count and remove other unwanted columns.另外,我不确定如何移动列name的值来替换列count并删除其他不需要的列。

Final Wanted Solution最终通缉方案

speaker   views Funny Sad Happy
Einstein  1000 100   110 120  
Newton    2000 210   220 200

Update更新

I am using Pandas 0.20 and the method .explode() is absent in my workplace and I am not permitted to update Pandas.我正在使用 Pandas 0.20 并且我的工作场所中没有.explode()方法,我不允许更新 Pandas。

For pandas >= 0.25.0 you can use ast.literal_eval + explode + pivot对于pandas >= 0.25.0你可以使用ast.literal_eval + explode + pivot

ii = df.set_index('speaker')['ratings'].apply(ast.literal_eval).explode()

u = pd.DataFrame(ii.tolist(), index=ii.index).reset_index()

u.pivot('speaker', 'name', 'count')

name      Funny  Happy  Sad
speaker
Einstein    100    120  110
Newton      210    200  220

For older versions of pandas对于旧版本的pandas

a = df['speaker']
b = df['ratings']

ii = [
  {**{'speaker': name}, **row}
  for name, element in zip(a, b) for row in ast.literal_eval(element)
]

pd.DataFrame(ii).pivot('speaker', 'name', 'count')

You may use sum , index.repeat to construct a new dataframe and join it df[['speaker', 'views']] and assign it to df1 .您可以使用sumindex.repeat来构造一个新的数据index.repeat并将其加入df[['speaker', 'views']]并将其分配给df1 Next, set_index , unstack , and reset_index接下来, set_indexunstack ,和reset_index

df['ratings'] = df['ratings'].apply(ast.literal_eval)
df1 = (pd.DataFrame(df.ratings.sum(), index=df.index.repeat(df.ratings.str.len()))
                   .drop('id', 1).join(df[['speaker', 'views']]))
df1.set_index(['speaker', 'views', 'name'])['count'].unstack().reset_index()

Out[213]:
name   speaker  views  Funny  Happy  Sad
0     Einstein  1000   100    120    110
1     Newton    2000   210    200    220

Note : name in the final output is the label of the columns axis.注意:最终输出中的name是列轴的标签。 If you don't want to see it, just chain additional rename_axis as follows如果你不想看到它,只需按如下方式链接额外的rename_axis

df1.set_index(['speaker', 'views', 'name'])['count'].unstack().reset_index() \
                                                    .rename_axis([None], axis=1)

Out[214]:
    speaker  views  Funny  Happy  Sad
0  Einstein  1000   100    120    110
1  Newton    2000   210    200    220

For loops are not always bad. For 循环并不总是坏的。 You can give it a try:你可以试一试:

dfr = pd.DataFrame(columns=['id','name','count'])

for i in range(len(df)):
    x = pd.DataFrame(df['ratings'].apply(ast.literal_eval)[i])
    x.index = [i]*len(x)
    dfr = dfr.append(x)


dfr = dfr.reset_index()   
dfr = (dfr.drop('id',axis=1)
         .pivot_table(index=['index'], columns='name',
                      values='count',aggfunc='sum')
         .rename_axis(None, axis=1).reset_index())

df_final = df.join(dfr)
df_final.drop(['index','ratings'],axis=1,inplace=True)

df_final

Gives:给出:

    speaker  views  Funny  Happy  Sad
0  Einstein   1000    100    120  110
1    Newton   2000    210    200  220

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我在 pandas dataframe 列中有字典作为值。 我想将键列和值作为列值 - I have dictionary as value in pandas dataframe columns. I want to make the keys columns and values as column value Python 如何使用多个 pandas dataframe 列中的值作为元组键和单个列作为值来创建字典 - Python how to create a dictionary using the values in multiple pandas dataframe columns as tuple keys and a single column as value 将 pandas dataframe 转换为具有一列键和另一列值的字典 - Convert pandas dataframe into dictionary with keys one column and values the other 如何将 Pandas 系列中的多个字典键转换为 DataFrame 中的列? - How to convert multiple dictionary keys in a Pandas Series to columns in a DataFrame? 如何基于字典键将列添加到pandas数据框? - How to add columns to pandas dataframe based on dictionary keys? 使用包含字典的 Pandas 列在 DataFrame 中创建新列 - Make new columns in a DataFrame using a pandas column having dictionary inside 在pandas数据框列中使用字典键 - Using dictionary keys in pandas dataframe columns Pandas - 将字典转换为数据框 - 键作为列 - Pandas - Convert dictionary to dataframe - keys as columns 如何访问嵌套的字典键以创建Pandas DataFrame - How to Access nested dictionary keys to make a Pandas DataFrame 如何将 map 字典键列在 Python Pandas ZBA834BA059A9A379459C112E47 中的列名列表中? - How to map dictionary keys to list of column names in Python Pandas DataFrame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM