[英]Pandas: Comma Separated Excel Cells not Converting to List
I've joined 3 Excel tab data sets to give me my base dataframe, and then for each line I want to count the int values in the comma separated values in DUAlloc and divide Amount by the DUAlloc Count then loop through the DuAlloc list and assign individual lines eg我加入了 3 个 Excel 选项卡数据集以提供我的基本数据框,然后对于每一行,我想计算 DUAlloc 中逗号分隔值中的 int 值,然后将 Amount 除以 DUAlloc 计数,然后遍历 DuAlloc 列表并分配单独的行,例如
Base Data:基础数据:
Description![]() |
DuAlloc ![]() |
Amount![]() |
---|---|---|
Blah![]() |
1,2,3,4,5 ![]() |
1000 ![]() |
Yada![]() |
30,15,3,4,5 ![]() |
200 ![]() |
Processed Data:处理数据:
Description![]() |
DuAlloc ![]() |
Amount![]() |
---|---|---|
Blah![]() |
1 ![]() |
200 ![]() |
Blah![]() |
2 ![]() |
200 ![]() |
Blah![]() |
3 ![]() |
200 ![]() |
Yada![]() |
3 ![]() |
40 ![]() |
Blah![]() |
4 ![]() |
200 ![]() |
Yada![]() |
4 ![]() |
40 ![]() |
Blah![]() |
5 ![]() |
200 ![]() |
Yada![]() |
5 ![]() |
40 ![]() |
Yada![]() |
15 ![]() |
40 ![]() |
Yada![]() |
30 ![]() |
40 ![]() |
I've tried numerous ways to convert to a list: list(), tolist(), but either get the same number for all the counts, or the nearest I've come is [len(str(c)) for c in df3['DUAlloc']]
which counts all the characters which I don't want.我尝试了多种转换为列表的方法:list()、tolist(),但要么对所有计数获得相同的数字,要么我最接近的是
[len(str(c)) for c in df3['DUAlloc']]
计算所有我不想要的字符。
How would I go about achieving this, and is Pandas the best route to take?我将如何实现这一目标,熊猫是最好的选择吗?
Use Series.str.split
, df.explode
, Groupby.transform
and df.div
:使用
Series.str.split
、 df.explode
、 Groupby.transform
和df.div
:
In [501]: out = df.assign(DuAlloc=df['DuAlloc'].str.split(',')).explode('DuAlloc')
In [506]: out['Amount'] = out['Amount'].div(out.groupby('Description')['Amount'].transform('size'))
In [507]: out
Out[507]:
Description DuAlloc Amount
0 Blah 1 200.0
0 Blah 2 200.0
0 Blah 3 200.0
0 Blah 4 200.0
0 Blah 5 200.0
1 Yada 30 40.0
1 Yada 15 40.0
1 Yada 3 40.0
1 Yada 4 40.0
1 Yada 5 40.0
You can use .str.count
to count the number of ,
in columns.您可以使用
.str.count
来计算,
列的数量。
out = (df.assign(Amount=df['Amount'].div(df['DuAlloc'].str.count(',').add(1)),
DuAlloc=df['DuAlloc'].str.split(','))
.explode('DuAlloc'))
print(out)
Description DuAlloc Amount
0 Blah 1 200.0
0 Blah 2 200.0
0 Blah 3 200.0
0 Blah 4 200.0
0 Blah 5 200.0
1 Yada 30 40.0
1 Yada 15 40.0
1 Yada 3 40.0
1 Yada 4 40.0
1 Yada 5 40.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.