简体   繁体   English

Pandas 基于非恒定值的第三列将值从一列复制到另一列

[英]Pandas copy value from one column to another based on a value third column that is not constant

I have a large dataset that is one huge table that actually should be many tables.我有一个大型数据集,它是一个巨大的表,实际上应该是许多表。 The headers are buried in the rows for the subsets.标题隐藏在子集的行中。 My goal is to pull those headers out into a new column such that I can filter by that column to get the data I want (one header at a time).我的目标是将这些标题拉到一个新列中,以便我可以按该列过滤以获得我想要的数据(一次一个 header)。 I've created an empty header column for this.我为此创建了一个空的 header 列。 There is always a series of 3 NaN values in the 'SCORE' column where the first value in the 'NAME' column in that series is the 'HEADER' I want.在“SCORE”列中总是有一系列 3 个 NaN 值,其中该系列中“NAME”列中的第一个值是我想要的“HEADER”。 So I'm thinking something about that relationship could be leveraged.所以我在考虑可以利用这种关系。 Current Pandas data frame has this structure:当前 Pandas 数据帧有这样的结构:

HEADER   NAME              SCORE
NaN      Header 1          NaN
NaN      Random Junk       NaN
NaN      Random Junk       NaN
NaN      Ed                98
NaN      Gary              78
NaN      Floyd             89
...      ...               ...
NaN      Header 2          NaN
NaN      Random Junk       NaN
NaN      Random Junk       NaN
NaN      Mary              96
NaN      Steve             78

and I want this:我想要这个:

HEADER        NAME              SCORE
Header 1      Header 1          NaN
Header 1      Random Junk       NaN
Header 1      Random Junk       NaN
Header 1      Ed                98
Header 1      Gary              78
Header 1      Floyd             89
...           ...               ...
Header 2      Header 2          NaN
Header 2      Random Junk       NaN
Header 2      Random Junk       NaN
Header 2      Mary              96
Header 2      Steve             78

so I can then remove the NaN rows and get what I'm truly after which is this:所以我可以删除 NaN 行并得到我真正想要的,这是:

HEADER        NAME              SCORE
Header 1      Ed                98
Header 1      Gary              78
Header 1      Floyd             89
...           ...               ...
Header 2      Mary              96
Header 2      Steve             78

After much searching, I'm not able to figure out how to do this conditional editing like this.经过大量搜索,我无法弄清楚如何进行这种条件编辑。 Would appreciate any help you can give.感谢您提供的任何帮助。

The header line occurs when SCORE has 3 nulls and 1 non-null in sequence, so: header 行出现在SCORE依次具有 3 个空值和 1 个非空值时,因此:

  1. Check for this condition using shift , isna , and notna .使用shiftisnanotna检查这种情况。
  2. mask the HEADER column as NAME when this condition is met.满足此条件时,将HEADERmaskNAME
  3. ffill (forward fill) the new HEADER . ffill (前向填充)新的HEADER
  4. dropna based on the SCORE . dropna基于SCORE
is_header = df.SCORE.isna() & df.SCORE.shift(-1).isna() & df.SCORE.shift(-2).isna() & df.SCORE.shift(-3).notna()
df.HEADER = df.HEADER.mask(is_header, df.NAME).ffill()
df = df.dropna(subset=['SCORE'])

#       HEADER   NAME  SCORE
# 3   Header 1     Ed   98.0
# 4   Header 1   Gary   78.0
# 5   Header 1  Floyd   89.0
# 9   Header 2   Mary   96.0
# 10  Header 2  Steve   78.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据 Pandas 中的第三列将值从一列复制到另一列? - How to copy a value from one column to another based on a third column in Pandas? 根据 Pandas 中的 id 将列值从一个数据帧复制到另一个数据帧 - Copy column value from one dataframe to another based on id in Pandas 根据条件将值从一列复制到另一列(使用熊猫) - Copy value from one column to another based on condition (using pandas) 根据另一列的值复制一列的值 - Copy value from one column based on the value of another column pandas - 有没有办法根据条件将值从一个 dataframe 列复制到另一个列? - pandas - Is there a way to copy a value from one dataframe column to another based on a condition? 如果满足条件,pandas 将值从一列复制到另一列 - pandas copy value from one column to another if condition is met 如何在熊猫中将特定值从一列复制到另一列 - How to copy specific value from one column to another in pandas 根据多列索引将值从一个 dataframe 复制到另一个 - Copy value from one dataframe to another based on multiple column index 熊猫-按一列分组,按另一列排序,从第三列获取价值 - Pandas - group by one column, sort by another, get value from the third column Pandas/Python:根据另一列中的值设置一列的值 - Pandas/Python: Set value of one column based on value in another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM