[英]pandas split column by some specific words and keep this delimiter
I want to split the column by some specific words and keep the delimiter in the same time.我想用一些特定的词分割列并同时保留分隔符。 I tried to split the column with
str.split
but the result isn't I want.我试图用
str.split
拆分列,但结果不是我想要的。
example data(test.csv):示例数据(test.csv):
a
abc123and321abcor213cba
abc321or123cbaand321cba
my code:我的代码:
import pandas as pd
df = pd.read_csv('test.csv')
df[['b','c']] = df['a'].str.split("and",1,expand=True)
df[['c','d']] = df['c'].str.split("or",1,expand=True)
print(df)
my result:我的结果:
a b c d
0 abc123and321abcor213cba abc123 321abc 213cba
1 abc321or123cbaand321cba abc321or123cba 321cba None
Desired result:期望的结果:
a b c d
0 abc123and321abcor213cba abc123and 321abcor 213cba
1 abc321or123cbaand321cba abc321or 123cbaand 321cba
How can I do this?我怎样才能做到这一点?
Borrowing from Tim's answer using a lookbehind Regex to split on and
or or
, without using up the seperating string in the split:借用 Tim 的答案,使用后向正则表达式拆分
and
or or
,而不用完拆分中的分隔字符串:
d = {'a': ["abc123and321abcor213cba", "abc321or123cbaand321cba"]}
df = pandas.DataFrame(data=d)
df[["b", "c", "d"]] = df['a'].str.split(r'(?<=and)|(?<=or)', expand=True)
Output: Output:
a b c d
0 abc123and321abcor213cba abc123and 321abcor 213cba
1 abc321or123cbaand321cba abc321or 123cbaand 321cba
If you have issues with split
check that you don't have a too old pandas version.如果您遇到
split
问题,请检查您的 pandas 版本是否太旧。
You could also use str.extractall
and unstack
.您还可以使用
str.extractall
和unstack
。 I'll also recommend to use join
to add the columns if you don't know in advance the number of matches/columns?如果您事先不知道匹配/列的数量,我还建议您使用
join
添加列?
df = pd.DataFrame({'a': ["abc123and321abcor213cba", "abc321or123cbaand321cba"]})
df.join(df['a'].str.extractall(r'(.*?(?:and|or)|.+$)')[0].unstack('match'))
Output: Output:
a 0 1 2
0 abc123and321abcor213cba abc123and 321abcor 213cba
1 abc321or123cbaand321cba abc321or 123cbaand 321cba
Try splitting on the lookbehind (?<=and|or)
:尝试拆分后视
(?<=and|or)
:
df[['b', 'c', 'd']] = df['a'].str.split(r'(?<=and|or)', 1, expand=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.