[英]How to create a new row for each comma separated value in a column in pandas
I have a dataframe like this:我有一个这样的数据框:
text category
sfsd sgvv abc,xyz
zydf sefs sdfsd yyy
dfsd dsrgd dggr xyz
eter vxg wfe abc
dfvf ertet abc,xyz
I want an output like this:我想要这样的输出:
text category
sfsd sgvv abc
sfsd sgvv xyz
zydf sefs sdfsd yyy
dfsd dsrgd dggr xyz
eter vxg wfe abc
dfvf ertet abc
dfvf ertet abc
Basically create a new row for each two or more category in category
column.基本上为category
列中的每两个或更多类别创建一个新行。
Use DataFrame.explode
(pandas 0.25+) with Series.str.split
:将DataFrame.explode
(pandas 0.25+) 与Series.str.split
一起Series.str.split
:
df1 = (df.assign(category = df['category'].str.split(','))
.explode('category')
.reset_index(drop=True))
For oldier pandas versions first DataFrame.set_index
for not separator column(s), then Series.str.split
and reshape by DataFrame.stack
, last DataFrame.reset_index
- first for remove second level of MultiIndex
and then for convert index to column:对于较旧的熊猫版本,首先DataFrame.set_index
用于非分隔列,然后Series.str.split
并通过DataFrame.stack
重塑,最后DataFrame.reset_index
- 首先删除第二级MultiIndex
,然后将索引转换为列:
df1 = (df.set_index('text')['category']
.str.split(',', expand=True)
.stack()
.reset_index(level=1, drop=True)
.reset_index(name='category'))
print (df1)
text category
0 sfsd sgvv abc
1 sfsd sgvv xyz
2 zydf sefs sdfsd yyy
3 dfsd dsrgd dggr xyz
4 eter vxg wfe abc
5 dfvf ertet abc
6 dfvf ertet xyz
Linking to this question , try the following code for your dataframe:链接到此问题,请为您的数据框尝试以下代码:
We can first split the column, expand it, stack it and then join it back to the original df like below:我们可以先拆分列,展开它,堆叠它,然后将它连接回原始 df,如下所示:
df.drop('category', axis=1).join(
df['category'].str.split(',', expand=True).stack().reset_index(level=1,drop=True).rename('category'))
Try using set_index
+ stack
+ str.split
+ unstack
+ reset_index
for much older versions:尝试使用set_index
+ stack
+ str.split
+ unstack
+ reset_index
大部分旧版本:
print(df.set_index('text')
.stack()
.str.split(', ', expand=True)
.stack()
.unstack(-2)
.reset_index(-1, drop=True)
.reset_index())
Below will give the output you need.下面将给出您需要的输出。 Assuming df is your dataset name.假设 df 是您的数据集名称。
new_df_skel = dict()
new_df_skel['text'] = list()
new_df_skel['category'] = list()
for index,item in df.iterrows():
item = dict(item)
unref_cat = item['category']
if "," in unref_cat:
for strip in unref_cat.split(','):
new_df_skel['category'].append(strip)
new_df_skel['text'].append(item['text'])
else:
new_df_skel['category'].append(strip)
new_df_skel['text'].append(unref_cat)
new_dataset = pd.DataFrame(new_df_skel)
Have a great day!祝你有美好的一天!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.