[英]Split cells in one column by comma into multiple rows in Pandas
For an input data as follows, I want to split column office_number
by comma into multiple rows: 对于如下输入数据,我想用逗号将
office_number
列office_number
为多行:
df = pd.DataFrame({'id':['1010084420','1010084420','1010084420','1010084421','1010084421','1010084421','1010084425'],
'building_name': ['A', 'A', 'A', 'East Tower', 'East Tower', 'West Tower', 'T1'],
'floor': ['1', '1', '2', '10', '10', '11','11'],
'office_number':['101-105', '106', '201-203, 205, 208', '1001-1005', '1006, 1008, 1010', '1101-1103', '1101-1105'],
'company_name': ['Ariel Resources Ltd.', 'A.O. Tatneft', '', 'Agrium Inc.', 'Creo Products Inc.', 'Cott Corp.', 'Creo Products Inc.']})
This is my solution with reference from here : 这是我的解决方案,参考从这里 :
res = (df.set_index(['id', 'building_name', 'floor', 'company_name'])
.stack()
.str.split(',', expand=True)
.stack()
.unstack(-2)
.reset_index(-1, drop=True)
.reset_index())
result = res[['id', 'building_name', 'floor', 'office_number', 'company_name']]
print(result)
Output: 输出:
id building_name floor office_number company_name
0 1010084420 A 1 106 A.O. Tatneft
1 1010084420 A 1 101-105 Ariel Resources Ltd.
2 1010084420 A 2 201-203
3 1010084420 A 2 205
4 1010084420 A 2 208
5 1010084421 East Tower 10 1001-1005 Agrium Inc.
6 1010084421 East Tower 10 1006 Creo Products Inc.
7 1010084421 East Tower 10 1008 Creo Products Inc.
8 1010084421 East Tower 10 1010 Creo Products Inc.
9 1010084421 West Tower 11 1101-1103 Cott Corp.
10 1010084425 T1 11 1101-1105 Creo Products Inc.
If you have any other solutions, welcome to share. 如有其他解决方案,欢迎分享。 Thanks.
谢谢。
Another solution is extract column by DataFrame.pop
, split
, stack
for Series
and DataFrame.join
to original: 另一个解决方案是通过
DataFrame.pop
, split
, Series
stack
和DataFrame.join
的原始列提取列:
s = (df.pop('office_number')
.str.split(',', expand=True)
.stack()
.reset_index(1, drop=True)
.rename('office_number'))
res = df.join(s).reset_index(drop=True)
result = res[['id', 'building_name', 'floor', 'office_number', 'company_name']]
print(result)
id building_name floor office_number company_name
0 1010084420 A 1 101-105 Ariel Resources Ltd.
1 1010084420 A 1 106 A.O. Tatneft
2 1010084420 A 2 201-203
3 1010084420 A 2 205
4 1010084420 A 2 208
5 1010084421 East Tower 10 1001-1005 Agrium Inc.
6 1010084421 East Tower 10 1006 Creo Products Inc.
7 1010084421 East Tower 10 1008 Creo Products Inc.
8 1010084421 East Tower 10 1010 Creo Products Inc.
9 1010084421 West Tower 11 1101-1103 Cott Corp.
10 1010084425 T1 11 1101-1105 Creo Products Inc.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.