简体   繁体   English

pandas 按某些特定单词拆分列并保留此分隔符

[英]pandas split column by some specific words and keep this delimiter

I want to split the column by some specific words and keep the delimiter in the same time.我想用一些特定的词分割列并同时保留分隔符。 I tried to split the column with str.split but the result isn't I want.我试图用str.split拆分列,但结果不是我想要的。

example data(test.csv):示例数据(test.csv):

a
abc123and321abcor213cba
abc321or123cbaand321cba

my code:我的代码:

import pandas as pd
df = pd.read_csv('test.csv')
df[['b','c']] = df['a'].str.split("and",1,expand=True)
df[['c','d']] = df['c'].str.split("or",1,expand=True)
print(df)

my result:我的结果:

                         a               b       c       d
0  abc123and321abcor213cba          abc123  321abc  213cba
1  abc321or123cbaand321cba  abc321or123cba  321cba    None

Desired result:期望的结果:

                         a               b         c       d
0  abc123and321abcor213cba       abc123and  321abcor  213cba
1  abc321or123cbaand321cba        abc321or 123cbaand  321cba

How can I do this?我怎样才能做到这一点?

Borrowing from Tim's answer using a lookbehind Regex to split on and or or , without using up the seperating string in the split:借用 Tim 的答案,使用后向正则表达式拆分and or or ,而不用完拆分中的分隔字符串:

d = {'a': ["abc123and321abcor213cba", "abc321or123cbaand321cba"]}

df = pandas.DataFrame(data=d)

df[["b", "c", "d"]] = df['a'].str.split(r'(?<=and)|(?<=or)', expand=True)

Output: Output:

                         a          b          c       d
0  abc123and321abcor213cba  abc123and   321abcor  213cba
1  abc321or123cbaand321cba   abc321or  123cbaand  321cba

If you have issues with split check that you don't have a too old pandas version.如果您遇到split问题,请检查您的 pandas 版本是否太旧。

You could also use str.extractall and unstack .您还可以使用str.extractallunstack I'll also recommend to use join to add the columns if you don't know in advance the number of matches/columns?如果您事先不知道匹配/列的数量,我还建议您使用join添加列?

df = pd.DataFrame({'a': ["abc123and321abcor213cba", "abc321or123cbaand321cba"]})

df.join(df['a'].str.extractall(r'(.*?(?:and|or)|.+$)')[0].unstack('match'))

Output: Output:

                         a          0          1       2
0  abc123and321abcor213cba  abc123and   321abcor  213cba
1  abc321or123cbaand321cba   abc321or  123cbaand  321cba

Try splitting on the lookbehind (?<=and|or) :尝试拆分后视(?<=and|or)

df[['b', 'c', 'd']] = df['a'].str.split(r'(?<=and|or)', 1, expand=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM