简体   繁体   中英

How do I manipulate pandas DataFrame with a column that has rows with strings that needs spliting

I have a pandas data frame similar to table A and I will like to get table B. What will be the easiest way to do this using pandas?

Thanks

table A(ColofInt has varying length of string to parse out):

ColA ColB ColofInt             ColD 
A     B   StrA;StrB;StrC;       1
A     B   StrD;StrB;StrC;StrD;  3
A     B   StrC;StrB;            2
A     B   StrB;                 5

table B:

ColA ColB ColofInt1     ColofInt2 ColofInt2 ColofInt3  ColD 
A     B   StrA            StrB      StrC                1
A     B   StrD            StrB      StrC    StrD        3
A     B   StrC            StrB                          2
A     B   StrB                                          5

Assuming a file 'tableA.csv' containing the following:

ColA,ColB,ColofInt,ColD 
A,B,StrA;StrB;StrC;,1
A,B,StrD;StrB;StrC;StrD;,3
A,B,StrC;StrB;,2
A,B,StrB;,5

Then:

import pandas as pd
tableA= pd.read_csv('tableA.csv')

This generates a dataframe with your new columns

data_aux = pd.DataFrame(list(tableA.ColofInt.str.split(';').apply(lambda x: x[:-1])))
cols = []
for e in data_aux .columns:
    cols.append('ColofInt' + str(e+1)) 
data_aux .columns = cols

Heres 'data_aux':

   ColofInt1    ColofInt2   ColofInt3   ColofInt4
0   StrA        StrB        StrC        None
1   StrD        StrB        StrC        StrD
2   StrC        StrB        None        None
3   StrB        None        None        None

And this joins the dataframes, dropping the original column.

tableB = pd.concat([tableA,data_aux],axis=1).drop('ColofInt',axis=1)

Here's the resulting 'tableB':

   ColA ColB    ColD    ColofInt1   ColofInt2   ColofInt3   ColofInt4
0   A   B       1       StrA        StrB        StrC        None
1   A   B       3       StrD        StrB        StrC        StrD
2   A   B       2       StrC        StrB        None        None
3   A   B       5       StrB        None        None        None

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM