简体   繁体   English

如何为熊猫中的列中的每个逗号分隔值创建一个新行

[英]How to create a new row for each comma separated value in a column in pandas

I have a dataframe like this:我有一个这样的数据框:

text                   category 
sfsd sgvv              abc,xyz
zydf sefs sdfsd        yyy
dfsd dsrgd dggr        xyz
eter vxg wfe           abc
dfvf ertet             abc,xyz

I want an output like this:我想要这样的输出:

text                   category 
sfsd sgvv              abc
sfsd sgvv              xyz
zydf sefs sdfsd        yyy
dfsd dsrgd dggr        xyz
eter vxg wfe           abc
dfvf ertet             abc
dfvf ertet             abc

Basically create a new row for each two or more category in category column.基本上为category列中的每两个或更多类别创建一个新行。

Use DataFrame.explode (pandas 0.25+) with Series.str.split :DataFrame.explode (pandas 0.25+) 与Series.str.split一起Series.str.split

df1 = (df.assign(category = df['category'].str.split(','))
         .explode('category')
         .reset_index(drop=True))

For oldier pandas versions first DataFrame.set_index for not separator column(s), then Series.str.split and reshape by DataFrame.stack , last DataFrame.reset_index - first for remove second level of MultiIndex and then for convert index to column:对于较旧的熊猫版本,首先DataFrame.set_index用于非分隔列,然后Series.str.split并通过DataFrame.stack重塑,最后DataFrame.reset_index - 首先删除第二级MultiIndex ,然后将索引转换为列:

df1 = (df.set_index('text')['category']
         .str.split(',', expand=True)
         .stack()
         .reset_index(level=1, drop=True)
         .reset_index(name='category'))
print (df1)
              text category
0        sfsd sgvv      abc
1        sfsd sgvv      xyz
2  zydf sefs sdfsd      yyy
3  dfsd dsrgd dggr      xyz
4     eter vxg wfe      abc
5       dfvf ertet      abc
6       dfvf ertet      xyz

Linking to this question , try the following code for your dataframe:链接到此问题,请为您的数据框尝试以下代码:

We can first split the column, expand it, stack it and then join it back to the original df like below:我们可以先拆分列,展开它,堆叠它,然后将它连接回原始 df,如下所示:

df.drop('category', axis=1).join(
  df['category'].str.split(',', expand=True).stack().reset_index(level=1,drop=True).rename('category'))

Try using set_index + stack + str.split + unstack + reset_index for much older versions:尝试使用set_index + stack + str.split + unstack + reset_index大部分旧版本:

print(df.set_index('text')
      .stack()
      .str.split(', ', expand=True)
      .stack()
      .unstack(-2)
      .reset_index(-1, drop=True)
      .reset_index())

Below will give the output you need.下面将给出您需要的输出。 Assuming df is your dataset name.假设 df 是您的数据集名称。

new_df_skel = dict()
new_df_skel['text'] = list()
new_df_skel['category'] = list()

for index,item in df.iterrows():
  item = dict(item)
  unref_cat = item['category']
  if "," in unref_cat:
    for strip in unref_cat.split(','):
      new_df_skel['category'].append(strip)
      new_df_skel['text'].append(item['text'])
  else:
    new_df_skel['category'].append(strip)
    new_df_skel['text'].append(unref_cat)

new_dataset = pd.DataFrame(new_df_skel)

Have a great day!祝你有美好的一天!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为熊猫中的每个逗号分隔值创建一个新行 - How to create a new row for each comma separated value in pandas pandas 合并列以使用逗号分隔值创建新列 - pandas merge columns to create new column with comma separated values 使用 pandas 从逗号分隔的列创建新变量 - Using pandas to create new variables from a comma separated column 如何在 pandas 的单个列中合并(逗号分隔的)行值? - How to combine (comma-separated) row values in a single column in pandas? 如何在新的列熊猫数据框中获取逗号分隔的值? - How to get comma separated values in new column pandas dataframe? 如何用逗号在CSV中给逗号分隔的值添加一个新列? - How to give comma separated values a new column in csv with pandas? 如何将 pandas dataframe 行转换为带条件的逗号分隔值 - How to turn a pandas dataframe row into a comma separated value with condition Pandas:如何根据每行包含 json 的列值创建新的 dataframe? - Pandas: how to create a new dataframe depending on a column value containing json for each row? 如何在数据框中拆分一列并将每个值存储为新行(以熊猫为单位)? - How to split a column in a dataframe and store each value as a new row (in pandas)? 如何将逗号分隔字符串列中的每个值提取到单独的行中 - How to extract each value in a column of comma separated strings into individual rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM