[英]How to create a new dataframe from existing dataframes?
I have the following 2 dataframes: 我有以下2个数据帧:
df1 DF1
product_ID tags
100 chocolate, sprinkles
101 chocolate, filled
102 glazed
df2 DF2
customer product_ID
A 100
A 101
B 101
C 100
C 102
B 101
A 100
C 102
I should be able to create a new dataframe like this. 我应该能够像这样创建一个新的数据帧。
| customer | chocolate | sprinkles | filled | glazed |
|----------|-----------|-----------|--------|--------|
| A | ? | ? | ? | ? |
| B | ? | ? | ? | ? |
| C | ? | ? | ? | ? |
Where the contents of cells represent the count of occurrences of product attribute. 其中单元格的内容表示产品属性的出现次数。
I've used merge
and got the following result 我使用了
merge
并得到了以下结果
df3 = pd.merge(df2, df1)
df3.drop(['product'], axis = 1)
customer tags
A chocolate, sprinkles
C chocolate, sprinkles
A chocolate, sprinkles
A chocolate, filled
B chocolate, filled
B chocolate, filled
C glazed
C glazed
How do we get to the final result from here? 我们如何从这里得到最终结果? Thanks in advance!
提前致谢!
Using get_dummies
使用
get_dummies
df.set_index('customer').tags.str.get_dummies(sep=',').sum(level=0)
Out[593]:
chocolate filled glazed sprinkles
customer
A 3 1 0 2
C 1 0 2 1
B 2 2 0 0
You can do this in 2 steps: 您可以通过两个步骤完成此操作:
pandas.crosstab
to tabulate your counts. pandas.crosstab
将计数列表。 Here's an example assuming you have performed your merge and the result is df
: 这是一个假设您已执行合并并且结果为
df
的示例:
import numpy as np
from itertools import chain
# split by comma to form series of lists
tag_split = df['tags'].str.split(',')
# create expanded dataframe
df_full = pd.DataFrame({'customer': np.repeat(df['customer'], tag_split.map(len)),
'tags': list(chain.from_iterable(tag_split))})
# use pd.crosstab for result
res = pd.crosstab(df_full['customer'], df_full['tags'])
print(res)
tags filled sprinkles chocolate glazed
customer
A 1 2 3 0
B 2 0 2 0
C 0 1 1 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.