[英]Pandas - How to separate and group by comma separated strings
For the following example dataframe, how would you: 对于以下示例数据框,您将如何:
searched_products
and bought_products
fields searched_products
和bought_products
字段 date
, page
and just products
date
, page
和products
分组 From this: 由此:
+------------+------+---------------------+-----------------+
| date | page | searched_products | bought_products |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc | apple, orange | orange |
+------------+------+---------------------+-----------------+
| 2019-01-01 | def | apple, pear, orange | orange, pear |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc | grapes, orange | apple, grapes |
+------------+------+---------------------+-----------------+
| 2019-01-02 | def | apple | apple, oranges |
+------------+------+---------------------+-----------------+
| 2019-01-02 | ghi | apple, grapes | orange |
+------------+------+---------------------+-----------------+
| 2019-01-02 | jkl | pear, apple | pear |
+------------+------+---------------------+-----------------+
| etc | etc | etc | etc |
+------------+------+---------------------+-----------------+
to this: 对此:
+------------+------+---------+----------+-----------+
| date | page | product | searches | purchases |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc | apple | 1 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc | orange | 2 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc | grapes | 1 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def | apple | 1 | NaN |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def | pear | 1 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def | orange | 1 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def | apple | 1 | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def | orange | NaN | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi | apple | 1 | NaN |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi | grapes | 1 | NaN |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi | orange | NaN | 1 |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl | apple | 1 | NaN |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl | pear | 1 | 1 |
+------------+------+---------+----------+-----------+
| etc | etc | etc | etc | etc |
+------------+------+---------+----------+-----------+
Solution for pandas 0.25+ with DataFrame.explode
for repeat values by splitted values, then aggregate counts by GroupBy.size
and last concat
together: 带有
DataFrame.explode
大熊猫的解决方案,用于通过DataFrame.explode
值重复值,然后通过GroupBy.size
和最后一个concat
汇总计数:
s = (df.assign(searches=df['searched_products'].str.split(', '))
.explode('searches')
.groupby(['date','page','searches'])
.size()
.rename('searches'))
b = (df.assign(purchases=df['bought_products'].str.split(', '))
.explode('purchases')
.groupby(['date','page','purchases'])
.size()
.rename('purchases'))
df = pd.concat([s, b], axis=1).rename_axis(('date','page','product')).reset_index()
print (df)
date page product searches purchases
0 20190101 abc apple 1.0 1.0
1 20190101 abc grapes 1.0 1.0
2 20190101 abc orange 2.0 1.0
3 20190101 def apple 1.0 NaN
4 20190101 def ear 1.0 NaN
5 20190101 def orange 1.0 1.0
6 20190101 def pear NaN 1.0
7 20190102 def apple 1.0 1.0
8 20190102 def oranges NaN 1.0
9 20190102 ghi apple 1.0 NaN
10 20190102 ghi grapes 1.0 NaN
11 20190102 ghi orange NaN 1.0
12 20190102 jkl apple 1.0 NaN
13 20190102 jkl pear 1.0 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.