简体   繁体   中英

Pandas: Comma Separated Excel Cells not Converting to List

I've joined 3 Excel tab data sets to give me my base dataframe, and then for each line I want to count the int values in the comma separated values in DUAlloc and divide Amount by the DUAlloc Count then loop through the DuAlloc list and assign individual lines eg

Base Data:

Description DuAlloc Amount
Blah 1,2,3,4,5 1000
Yada 30,15,3,4,5 200

Processed Data:

Description DuAlloc Amount
Blah 1 200
Blah 2 200
Blah 3 200
Yada 3 40
Blah 4 200
Yada 4 40
Blah 5 200
Yada 5 40
Yada 15 40
Yada 30 40

I've tried numerous ways to convert to a list: list(), tolist(), but either get the same number for all the counts, or the nearest I've come is [len(str(c)) for c in df3['DUAlloc']] which counts all the characters which I don't want.

How would I go about achieving this, and is Pandas the best route to take?

Use Series.str.split , df.explode , Groupby.transform and df.div :

In [501]: out = df.assign(DuAlloc=df['DuAlloc'].str.split(',')).explode('DuAlloc')

In [506]: out['Amount'] = out['Amount'].div(out.groupby('Description')['Amount'].transform('size'))

In [507]: out
Out[507]: 
  Description DuAlloc  Amount
0        Blah       1   200.0
0        Blah       2   200.0
0        Blah       3   200.0
0        Blah       4   200.0
0        Blah       5   200.0
1        Yada      30    40.0
1        Yada      15    40.0
1        Yada       3    40.0
1        Yada       4    40.0
1        Yada       5    40.0

You can use .str.count to count the number of , in columns.

out = (df.assign(Amount=df['Amount'].div(df['DuAlloc'].str.count(',').add(1)),
                 DuAlloc=df['DuAlloc'].str.split(','))
       .explode('DuAlloc'))
print(out)

  Description DuAlloc  Amount
0        Blah       1   200.0
0        Blah       2   200.0
0        Blah       3   200.0
0        Blah       4   200.0
0        Blah       5   200.0
1        Yada      30    40.0
1        Yada      15    40.0
1        Yada       3    40.0
1        Yada       4    40.0
1        Yada       5    40.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM