簡體   English   中英

熊貓-如何用逗號分隔的字符串分隔和分組

[英]Pandas - How to separate and group by comma separated strings

對於以下示例數據框,您將如何:

  1. 分隔出searched_productsbought_products字段
  2. datepageproducts分組
  3. 帶有顯示每種產品計數的列

由此:

+------------+------+---------------------+-----------------+
| date       | page | searched_products   | bought_products |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc  | apple, orange       | orange          |
+------------+------+---------------------+-----------------+
| 2019-01-01 | def  | apple, pear, orange | orange, pear    |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc  | grapes, orange      | apple, grapes   |
+------------+------+---------------------+-----------------+
| 2019-01-02 | def  | apple               | apple, oranges  |
+------------+------+---------------------+-----------------+
| 2019-01-02 | ghi  | apple, grapes       | orange          |
+------------+------+---------------------+-----------------+
| 2019-01-02 | jkl  | pear, apple         | pear            |
+------------+------+---------------------+-----------------+
| etc        | etc  | etc                 | etc             |
+------------+------+---------------------+-----------------+

對此:

+------------+------+---------+----------+-----------+
| date       | page | product | searches | purchases |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | apple   | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | orange  | 2        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | grapes  | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | pear    | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | orange  | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def  | apple   | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def  | orange  | NaN      | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | grapes  | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | orange  | NaN      | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl  | pear    | 1        | 1         |
+------------+------+---------+----------+-----------+
| etc        | etc  | etc     | etc      | etc       |
+------------+------+---------+----------+-----------+

帶有DataFrame.explode大熊貓的解決方案,用於通過DataFrame.explode值重復值,然后通過GroupBy.size和最后一個concat匯總計數:

s = (df.assign(searches=df['searched_products'].str.split(', '))
      .explode('searches')
      .groupby(['date','page','searches'])
      .size()
      .rename('searches'))
b = (df.assign(purchases=df['bought_products'].str.split(', '))
       .explode('purchases')
       .groupby(['date','page','purchases'])
       .size()
       .rename('purchases'))

df = pd.concat([s, b], axis=1).rename_axis(('date','page','product')).reset_index()
print (df)
        date page  product  searches  purchases
0   20190101  abc    apple       1.0        1.0
1   20190101  abc   grapes       1.0        1.0
2   20190101  abc   orange       2.0        1.0
3   20190101  def    apple       1.0        NaN
4   20190101  def      ear       1.0        NaN
5   20190101  def   orange       1.0        1.0
6   20190101  def     pear       NaN        1.0
7   20190102  def    apple       1.0        1.0
8   20190102  def  oranges       NaN        1.0
9   20190102  ghi    apple       1.0        NaN
10  20190102  ghi   grapes       1.0        NaN
11  20190102  ghi   orange       NaN        1.0
12  20190102  jkl    apple       1.0        NaN
13  20190102  jkl     pear       1.0        1.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM