熊猫-如何用逗号分隔的字符串分隔和分组

Question

For the following example dataframe, how would you: 对于以下示例数据框，您将如何：

separate out the searched_products and bought_products fields 分隔出searched_products和bought_products字段
group them by date , page and just products 按date ， page和products分组
with columns showing a count of each product 带有显示每种产品计数的列

From this: 由此：

+------------+------+---------------------+-----------------+
| date       | page | searched_products   | bought_products |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc  | apple, orange       | orange          |
+------------+------+---------------------+-----------------+
| 2019-01-01 | def  | apple, pear, orange | orange, pear    |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc  | grapes, orange      | apple, grapes   |
+------------+------+---------------------+-----------------+
| 2019-01-02 | def  | apple               | apple, oranges  |
+------------+------+---------------------+-----------------+
| 2019-01-02 | ghi  | apple, grapes       | orange          |
+------------+------+---------------------+-----------------+
| 2019-01-02 | jkl  | pear, apple         | pear            |
+------------+------+---------------------+-----------------+
| etc        | etc  | etc                 | etc             |
+------------+------+---------------------+-----------------+

to this: 对此：

+------------+------+---------+----------+-----------+
| date       | page | product | searches | purchases |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | apple   | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | orange  | 2        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | grapes  | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | pear    | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | orange  | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def  | apple   | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def  | orange  | NaN      | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | grapes  | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | orange  | NaN      | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl  | pear    | 1        | 1         |
+------------+------+---------+----------+-----------+
| etc        | etc  | etc     | etc      | etc       |
+------------+------+---------+----------+-----------+

Answer 1

Solution for pandas 0.25+ with DataFrame.explode for repeat values by splitted values, then aggregate counts by GroupBy.size and last concat together: 带有DataFrame.explode大熊猫的解决方案，用于通过DataFrame.explode值重复值，然后通过GroupBy.size和最后一个concat汇总计数：

s = (df.assign(searches=df['searched_products'].str.split(', '))
      .explode('searches')
      .groupby(['date','page','searches'])
      .size()
      .rename('searches'))
b = (df.assign(purchases=df['bought_products'].str.split(', '))
       .explode('purchases')
       .groupby(['date','page','purchases'])
       .size()
       .rename('purchases'))

df = pd.concat([s, b], axis=1).rename_axis(('date','page','product')).reset_index()
print (df)
        date page  product  searches  purchases
0   20190101  abc    apple       1.0        1.0
1   20190101  abc   grapes       1.0        1.0
2   20190101  abc   orange       2.0        1.0
3   20190101  def    apple       1.0        NaN
4   20190101  def      ear       1.0        NaN
5   20190101  def   orange       1.0        1.0
6   20190101  def     pear       NaN        1.0
7   20190102  def    apple       1.0        1.0
8   20190102  def  oranges       NaN        1.0
9   20190102  ghi    apple       1.0        NaN
10  20190102  ghi   grapes       1.0        NaN
11  20190102  ghi   orange       NaN        1.0
12  20190102  jkl    apple       1.0        NaN
13  20190102  jkl     pear       1.0        1.0

熊猫-如何用逗号分隔的字符串分隔和分组

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-09-16 12:49:13

熊猫-如何用逗号分隔的字符串分隔和分组

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-09-16 12:49:13

解决方案1
3 已采纳 2019-09-16 12:49:13