简体   繁体   English

如何根据 dataframe 中的特定行值扩展行?

[英]How to expand rows based on a particular row value in a dataframe?

I have a dataframe as follows:我有一个 dataframe 如下:

Col1     Col2     Col3     Col4
AB       2i       2j|2k    2y
CD       3j       3k|3p|3e 3x

So, for those rows which have pipeline separated values, I want to expand the rows as follows: (This should be the final dataframe)因此,对于那些具有管道分隔值的行,我想按如下方式扩展行:(这应该是最终的数据框)

Col1     Col2     Col3     Col4
AB       2i       2j       2y
AB       2i       2k       2y
CD       3j       3k       3x
CD       3j       3p       3x
CD       3j       3e       3x

So, the pipeline separated values have to be expanded into their own rows and the other field values have to be copied in. How to do that in pandas dataframe?因此,必须将管道分隔值扩展为它们自己的行,并且必须复制其他字段值。如何在 pandas dataframe 中做到这一点?

Use for pandas 0.25.0+ Series.str.split with DataFrame.assign for column filled of lists and then DataFrame.explode , last for default index DataFrame.reset_index with drop=True : Use for pandas 0.25.0+ Series.str.split with DataFrame.assign for column filled of lists and then DataFrame.explode , last for default index DataFrame.reset_index with drop=True :

df = df.assign(Col3 = df['Col3'].str.split('|')).explode('Col3').reset_index(drop=True)
print (df)
  Col1 Col2 Col3 Col4
0   AB   2i   2j   2y
1   AB   2i   2k   2y
2   CD   3j   3k   3x
3   CD   3j   3p   3x
4   CD   3j   3e   3x

EDIT: If column name has space:编辑:如果列名有空格:

print (df)
  Col1 Col2    my col Col4
0   AB   2i     2j|2k   2y
1   CD   3j  3k|3p|3e   3x

df['my col'] = df['my col'].str.split('|')
df = df.explode('my col').reset_index(drop=True)
print (df)
  Col1 Col2 my col Col4
0   AB   2i     2j   2y
1   AB   2i     2k   2y
2   CD   3j     3k   3x
3   CD   3j     3p   3x
4   CD   3j     3e   3x

Solution for oldier versions:旧版本的解决方案:

c = df.columns
s = (df.pop('Col3')
       .str.split('|', expand=True)
       .stack()
       .reset_index(drop=True, level=1)
       .rename('Col3'))

df = df.join(s).reset_index(drop=True)[c]
print (df)
  Col1 Col2 Col3 Col4
0   AB   2i   2j   2y
1   AB   2i   2k   2y
2   CD   3j   3k   3x
3   CD   3j   3p   3x
4   CD   3j   3e   3x

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据列的值扩展 DataFrame 行? - How to expand DataFrame rows based on a column's value? 如何根据一行是否包含另一行中的值组合数据框中的行 - How to combine rows in dataframe based on if a row contains a value in another row 根据行值删除 dataframe 中的行 - Dropping rows in dataframe based on row value 如何根据该行中的单元格是否删除 Dataframe 中的行。 在特定列下是空的? - How to drop rows in a Dataframe based on whether or not a cell in that row. under a particular column is empty? 如何使用基于 python 中的行的计算列表来扩展 dataframe - How to expand dataframe with a calculated list based on rows in python 根据熊猫列中的值从DataFrame中选择特定的行 - Select particular row from a DataFrame based on value in a column in pandas 如果根据数据帧的行和列值满足特定条件,如何获取列标题? - How to fetch a column header if a particular condition is met based on row and column value of the dataframe? 如何根据前一行合并数据框中的行? - How to merge rows in a Dataframe based on a previous row? 在 Python 中的特定行值之后,使用来自 dataframe2 和 select 中的所有行的值过滤 dataframe1 - Filter dataframe1 with values from dataframe2 and select all rows in dataframe 1 after a particular row value in Python 如何根据特定文本从 dataframe 中删除行 - How to remove rows from a dataframe based on a particular text
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM