[英]Numbering for Rows generated through comma separated Pandas DataFrame
I have a Pandas DataFrame as follows:我有一个 Pandas DataFrame 如下:
+----------+---------------+-----------+---------------+
| List No. | List Item No. | Item Name | Issues |
+----------+---------------+-----------+---------------+
| 1 | 1 | A | foo, bar, baz |
| 1 | 2 | B | foo, bar |
| 2 | 3A | A | bar, quz |
| 2 | 3B | C | baz, foo, quz |
+----------+---------------+-----------+---------------+
Above can be generated using following code以上可以使用以下代码生成
data = {'List No.':['1', '1', '2', '2'],
'List Item No.':['1', '2', '3A', '3B'],
'Item Name':['A', 'B', 'A', 'C'],
'Issues':['foo, bar, baz','foo, bar', 'bar, quz', 'baz, foo, quz']}
df = pd.DataFrame(data)
I want to create rows based on number of values present in Issues
.我想根据Issues
中存在的值的数量创建行。 For example there are 3 comma separated values so I want to create 3 rows.例如有 3 个逗号分隔值,所以我想创建 3 行。 1 for each value.每个值 1。 This can be done using [item for sublist in df.Issues.str.split(',').tolist() for item in sublist]
.这可以使用[item for sublist in df.Issues.str.split(',').tolist() for item in sublist]
完成。 However, I also wan to create issue number which I am unable to do.但是,我也想创建我无法做到的问题编号。
Expected Output预计 Output
+----------+---------------+-----------+-----------+-------+
| List No. | List Item No. | Item Name | Issue No. | Issue |
+----------+---------------+-----------+-----------+-------+
| 1 | 1 | A | 1 | foo |
| 1 | 1 | A | 2 | bar |
| 1 | 1 | A | 3 | baz |
| 1 | 2 | B | 1 | foo |
| 1 | 2 | B | 2 | bar |
| 2 | 3A | A | 1 | bar |
| 2 | 3A | A | 2 | quz |
| 2 | 3B | C | 1 | baz |
| 2 | 3B | C | 2 | foo |
| 2 | 3B | C | 3 | quz |
+----------+---------------+-----------+-----------+-------+
Use DataFrame.explode
with GroupBy.cumcount
:使用DataFrame.explode
和GroupBy.cumcount
:
df1 = df.assign(Issues = df.Issues.str.split(',')).explode('Issues')
df1['Issue No.'] = df1.groupby(level=0).cumcount().add(1)
If position of column is important use DataFrame.insert
:如果列的 position 很重要,请使用DataFrame.insert
:
df1.insert(3, 'Issue No.', df1.groupby(level=0).cumcount().add(1))
print (df1)
List No. List Item No. Item Name Issue No. Issues
0 1 1 A 1 foo
0 1 1 A 2 bar
0 1 1 A 3 baz
1 1 2 B 1 foo
1 1 2 B 2 bar
2 2 3A A 1 bar
2 2 3A A 2 quz
3 2 3B C 1 baz
3 2 3B C 2 foo
3 2 3B C 3 quz
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.