将一个pandas列文本拆分为多个列

Question

For example, I have one pandas column contain 例如，我有一个pandas列包含

text
A1V2
B2C7Z1

I want split it into 26(AZ) columns with alphabet followed value, if it is missing, then -1. 我想将它拆分为26（AZ）列，其中字母跟随值，如果缺少，则为-1。

So, it can be 所以，它可以

text    A  B  C  D ...  Z
A1V2    1  -1 -1 -1 ... -1
B2C7Z1 -1  2  7  -1 ... 1

Is there any fast way rather than using df.apply()? 有没有快速的方式，而不是使用df.apply（）？

Followup: Thank Psidom for the brilliant answer. 跟进：感谢Psidom的精彩回答。 When I use the method run 4 millions rows, it took me 1 hour. 当我使用该方法运行4百万行时，我花了1个小时。 I hope there's another way can make it faster. 我希望有另一种方法可以让它更快。 It seems str.extractall() is the most time-consuming one. 似乎str.extractall（）是最耗时的。

Answer 1

Try str.extractall with regex (?P<key>[AZ])(?P<value>[0-9]+) which extracts the key( [AZ] ) value( [0-9]+ ) into separate columns and a long to wide transform should get you there. 尝试str.extractall与正则表达式(?P<key>[AZ])(?P<value>[0-9]+)将密钥（ [AZ] ）值（ [0-9] + ）提取到单独的列中从长到宽的变换应该会让你到那里。

Here regex (?P<key>[AZ])(?P<value>[0-9]+) matches letterDigits pattern and the two capture groups go into two separate columns in the result as columns key and value (with ?P<> syntax); 这里的正则表达式(?P<key>[AZ])(?P<value>[0-9]+)与letterDigits模式匹配，两个捕获组在结果中分为两列，分别为列键和值（带?P<>语法）;

And since extractall puts multiple matches into separate rows, you will need to transform it to wide format with unstack on the key column: 由于extractall将多个匹配放入单独的行中，因此您需要将其转换为宽格式，并在key列上使用unstack ：

(df.text.str.extractall("(?P<key>[A-Z])(?P<value>[0-9]+)")
 .reset_index('match', drop=True)
 .set_index('key', append=True)
 .value.unstack('key').fillna(-1))

#key    A   B   C   V   Z
#  0    1  -1  -1   2  -1
#  1   -1   2   7  -1   1

将一个pandas列文本拆分为多个列

问题描述

1 个解决方案

解决方案1
5 已采纳 2017-02-23 19:32:31

将一个pandas列文本拆分为多个列

问题描述

1 个解决方案

解决方案1 5 已采纳 2017-02-23 19:32:31

解决方案1
5 已采纳 2017-02-23 19:32:31