简体   繁体   English

Pandas:逗号分隔的 Excel 单元格未转换为列表

[英]Pandas: Comma Separated Excel Cells not Converting to List

I've joined 3 Excel tab data sets to give me my base dataframe, and then for each line I want to count the int values in the comma separated values in DUAlloc and divide Amount by the DUAlloc Count then loop through the DuAlloc list and assign individual lines eg我加入了 3 个 Excel 选项卡数据集以提供我的基本数据框,然后对于每一行,我想计算 DUAlloc 中逗号分隔值中的 int 值,然后将 Amount 除以 DUAlloc 计数,然后遍历 DuAlloc 列表并分配单独的行,例如

Base Data:基础数据:

Description描述 DuAlloc DuAlloc Amount数量
Blah废话 1,2,3,4,5 1,2,3,4,5 1000 1000
Yada雅达 30,15,3,4,5 30,15,3,4,5 200 200

Processed Data:处理数据:

Description描述 DuAlloc DuAlloc Amount数量
Blah废话 1 1 200 200
Blah废话 2 2 200 200
Blah废话 3 3 200 200
Yada雅达 3 3 40 40
Blah废话 4 4 200 200
Yada雅达 4 4 40 40
Blah废话 5 5 200 200
Yada雅达 5 5 40 40
Yada雅达 15 15 40 40
Yada雅达 30 30 40 40

I've tried numerous ways to convert to a list: list(), tolist(), but either get the same number for all the counts, or the nearest I've come is [len(str(c)) for c in df3['DUAlloc']] which counts all the characters which I don't want.我尝试了多种转换为列表的方法:list()、tolist(),但要么对所有计数获得相同的数字,要么我最接近的是[len(str(c)) for c in df3['DUAlloc']]计算所有我不想要的字符。

How would I go about achieving this, and is Pandas the best route to take?我将如何实现这一目标,熊猫是最好的选择吗?

Use Series.str.split , df.explode , Groupby.transform and df.div :使用Series.str.splitdf.explodeGroupby.transformdf.div

In [501]: out = df.assign(DuAlloc=df['DuAlloc'].str.split(',')).explode('DuAlloc')

In [506]: out['Amount'] = out['Amount'].div(out.groupby('Description')['Amount'].transform('size'))

In [507]: out
Out[507]: 
  Description DuAlloc  Amount
0        Blah       1   200.0
0        Blah       2   200.0
0        Blah       3   200.0
0        Blah       4   200.0
0        Blah       5   200.0
1        Yada      30    40.0
1        Yada      15    40.0
1        Yada       3    40.0
1        Yada       4    40.0
1        Yada       5    40.0

You can use .str.count to count the number of , in columns.您可以使用.str.count来计算,列的数量。

out = (df.assign(Amount=df['Amount'].div(df['DuAlloc'].str.count(',').add(1)),
                 DuAlloc=df['DuAlloc'].str.split(','))
       .explode('DuAlloc'))
print(out)

  Description DuAlloc  Amount
0        Blah       1   200.0
0        Blah       2   200.0
0        Blah       3   200.0
0        Blah       4   200.0
0        Blah       5   200.0
1        Yada      30    40.0
1        Yada      15    40.0
1        Yada       3    40.0
1        Yada       4    40.0
1        Yada       5    40.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM