简体   繁体   中英

split one pandas column text to multiple columns

For example, I have one pandas column contain

text
A1V2
B2C7Z1

I want split it into 26(AZ) columns with alphabet followed value, if it is missing, then -1.

So, it can be

text    A  B  C  D ...  Z
A1V2    1  -1 -1 -1 ... -1
B2C7Z1 -1  2  7  -1 ... 1

Is there any fast way rather than using df.apply()?

Followup: Thank Psidom for the brilliant answer. When I use the method run 4 millions rows, it took me 1 hour. I hope there's another way can make it faster. It seems str.extractall() is the most time-consuming one.

Try str.extractall with regex (?P<key>[AZ])(?P<value>[0-9]+) which extracts the key( [AZ] ) value( [0-9]+ ) into separate columns and a long to wide transform should get you there.

Here regex (?P<key>[AZ])(?P<value>[0-9]+) matches letterDigits pattern and the two capture groups go into two separate columns in the result as columns key and value (with ?P<> syntax);

And since extractall puts multiple matches into separate rows, you will need to transform it to wide format with unstack on the key column:

(df.text.str.extractall("(?P<key>[A-Z])(?P<value>[0-9]+)")
 .reset_index('match', drop=True)
 .set_index('key', append=True)
 .value.unstack('key').fillna(-1))

#key    A   B   C   V   Z
#  0    1  -1  -1   2  -1
#  1   -1   2   7  -1   1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM