For example, I have one pandas column contain
text
A1V2
B2C7Z1
I want split it into 26(AZ) columns with alphabet followed value, if it is missing, then -1.
So, it can be
text A B C D ... Z
A1V2 1 -1 -1 -1 ... -1
B2C7Z1 -1 2 7 -1 ... 1
Is there any fast way rather than using df.apply()?
Followup: Thank Psidom for the brilliant answer. When I use the method run 4 millions rows, it took me 1 hour. I hope there's another way can make it faster. It seems str.extractall() is the most time-consuming one.
Try str.extractall
with regex (?P<key>[AZ])(?P<value>[0-9]+)
which extracts the key( [AZ] ) value( [0-9]+ ) into separate columns and a long to wide transform should get you there.
Here regex (?P<key>[AZ])(?P<value>[0-9]+)
matches letterDigits pattern and the two capture groups go into two separate columns in the result as columns key and value (with ?P<>
syntax);
And since extractall puts multiple matches into separate rows, you will need to transform it to wide format with unstack
on the key
column:
(df.text.str.extractall("(?P<key>[A-Z])(?P<value>[0-9]+)")
.reset_index('match', drop=True)
.set_index('key', append=True)
.value.unstack('key').fillna(-1))
#key A B C V Z
# 0 1 -1 -1 2 -1
# 1 -1 2 7 -1 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.