简体   繁体   English

熊猫-如何用逗号分隔的字符串分隔和分组

[英]Pandas - How to separate and group by comma separated strings

For the following example dataframe, how would you: 对于以下示例数据框,您将如何:

  1. separate out the searched_products and bought_products fields 分隔出searched_productsbought_products字段
  2. group them by date , page and just products datepageproducts分组
  3. with columns showing a count of each product 带有显示每种产品计数的列

From this: 由此:

+------------+------+---------------------+-----------------+
| date       | page | searched_products   | bought_products |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc  | apple, orange       | orange          |
+------------+------+---------------------+-----------------+
| 2019-01-01 | def  | apple, pear, orange | orange, pear    |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc  | grapes, orange      | apple, grapes   |
+------------+------+---------------------+-----------------+
| 2019-01-02 | def  | apple               | apple, oranges  |
+------------+------+---------------------+-----------------+
| 2019-01-02 | ghi  | apple, grapes       | orange          |
+------------+------+---------------------+-----------------+
| 2019-01-02 | jkl  | pear, apple         | pear            |
+------------+------+---------------------+-----------------+
| etc        | etc  | etc                 | etc             |
+------------+------+---------------------+-----------------+

to this: 对此:

+------------+------+---------+----------+-----------+
| date       | page | product | searches | purchases |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | apple   | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | orange  | 2        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | grapes  | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | pear    | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | orange  | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def  | apple   | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def  | orange  | NaN      | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | grapes  | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | orange  | NaN      | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl  | pear    | 1        | 1         |
+------------+------+---------+----------+-----------+
| etc        | etc  | etc     | etc      | etc       |
+------------+------+---------+----------+-----------+

Solution for pandas 0.25+ with DataFrame.explode for repeat values by splitted values, then aggregate counts by GroupBy.size and last concat together: 带有DataFrame.explode大熊猫的解决方案,用于通过DataFrame.explode值重复值,然后通过GroupBy.size和最后一个concat汇总计数:

s = (df.assign(searches=df['searched_products'].str.split(', '))
      .explode('searches')
      .groupby(['date','page','searches'])
      .size()
      .rename('searches'))
b = (df.assign(purchases=df['bought_products'].str.split(', '))
       .explode('purchases')
       .groupby(['date','page','purchases'])
       .size()
       .rename('purchases'))

df = pd.concat([s, b], axis=1).rename_axis(('date','page','product')).reset_index()
print (df)
        date page  product  searches  purchases
0   20190101  abc    apple       1.0        1.0
1   20190101  abc   grapes       1.0        1.0
2   20190101  abc   orange       2.0        1.0
3   20190101  def    apple       1.0        NaN
4   20190101  def      ear       1.0        NaN
5   20190101  def   orange       1.0        1.0
6   20190101  def     pear       NaN        1.0
7   20190102  def    apple       1.0        1.0
8   20190102  def  oranges       NaN        1.0
9   20190102  ghi    apple       1.0        NaN
10  20190102  ghi   grapes       1.0        NaN
11  20190102  ghi   orange       NaN        1.0
12  20190102  jkl    apple       1.0        NaN
13  20190102  jkl     pear       1.0        1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM