[英]how to split the data in a column based on multiple delimiters, into multiple columns, in pandas
I have a dataframe with only one column named 'ALL_category[![enter image description here][1]][1]'.我有一个只有一列名为“ALL_category[![在此处输入图像描述][1]][1]”的数据框。 There are multiple names in a row ranging between 1 to 3 and separated by delimiters '|', '||'一行中有多个名称,范围在 1 到 3 之间,并以分隔符“|”、“||”分隔or '|||', which can be either at the beginning, in between or end of the words in every row.或'|||',可以在每行单词的开头、中间或结尾。 I want to split the column into multiple columns such that the new columns contain the names.我想将该列拆分为多个列,以便新列包含名称。 How can I do it?我该怎么做?
Below is the code to generate the dataframe:下面是生成数据框的代码:
x = {'ALL Categories': ['Rakesh||Ramesh|','||Rajesh|','HARPRIT|||','Tushar||manmit|']}
df = pd.DataFrame(x)
When I used the below code for modification of the above dataframe, it didn't give me any result.当我使用下面的代码修改上面的数据框时,它没有给我任何结果。
data = data.ALL_HOLDS.str.split(r'w', expand = True)
I believe you need Series.str.extractall
if want each word to separate column:我相信你需要Series.str.extractall
如果希望每个单词分隔列:
df1 = df['ALL Categories'].str.extractall(r'(\w+)')[0].unstack()
print (df1)
match 0 1
0 Rakesh Ramesh
1 Rajesh NaN
2 HARPRIT NaN
3 Tushar manmit
Or a bit changed code of @Chris A from comments with Series.str.strip
and Series.str.split
by one or more |
或者从一个或多个Series.str.strip
和Series.str.split
的评论中对@Chris A 的代码进行一些更改|
: :
df1 = df['ALL Categories'].str.strip('|').str.split(r'\|+', expand=True)
print (df1)
0 1
0 Rakesh Ramesh
1 Rajesh None
2 HARPRIT None
3 Tushar manmit
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.