简体   繁体   English

如何在熊猫中基于多个分隔符将列中的数据拆分为多列

[英]how to split the data in a column based on multiple delimiters, into multiple columns, in pandas

I have a dataframe with only one column named 'ALL_category[![enter image description here][1]][1]'.我有一个只有一列名为“ALL_category[![在此处输入图像描述][1]][1]”的数据框。 There are multiple names in a row ranging between 1 to 3 and separated by delimiters '|', '||'一行中有多个名称,范围在 1 到 3 之间,并以分隔符“|”、“||”分隔or '|||', which can be either at the beginning, in between or end of the words in every row.或'|||',可以在每行单词的开头、中间或结尾。 I want to split the column into multiple columns such that the new columns contain the names.我想将该列拆分为多个列,以便新列包含名称。 How can I do it?我该怎么做?

Below is the code to generate the dataframe:下面是生成数据框的代码:

x = {'ALL Categories': ['Rakesh||Ramesh|','||Rajesh|','HARPRIT|||','Tushar||manmit|']}
df = pd.DataFrame(x)

When I used the below code for modification of the above dataframe, it didn't give me any result.当我使用下面的代码修改上面的数据框时,它没有给我任何结果。

data = data.ALL_HOLDS.str.split(r'w', expand = True)

I believe you need Series.str.extractall if want each word to separate column:我相信你需要Series.str.extractall如果希望每个单词分隔列:

df1 = df['ALL Categories'].str.extractall(r'(\w+)')[0].unstack()
print (df1)
match        0       1
0       Rakesh  Ramesh
1       Rajesh     NaN
2      HARPRIT     NaN
3       Tushar  manmit

Or a bit changed code of @Chris A from comments with Series.str.strip and Series.str.split by one or more |或者从一个或多个Series.str.stripSeries.str.split的评论中对@Chris A 的代码进行一些更改| :

df1 = df['ALL Categories'].str.strip('|').str.split(r'\|+', expand=True)
print (df1)
         0       1
0   Rakesh  Ramesh
1   Rajesh    None
2  HARPRIT    None
3   Tushar  manmit

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM