Pandas 基于非恒定值的第三列将值从一列复制到另一列

Question

I have a large dataset that is one huge table that actually should be many tables.我有一个大型数据集，它是一个巨大的表，实际上应该是许多表。 The headers are buried in the rows for the subsets.标题隐藏在子集的行中。 My goal is to pull those headers out into a new column such that I can filter by that column to get the data I want (one header at a time).我的目标是将这些标题拉到一个新列中，以便我可以按该列过滤以获得我想要的数据（一次一个 header）。 I've created an empty header column for this.我为此创建了一个空的 header 列。 There is always a series of 3 NaN values in the 'SCORE' column where the first value in the 'NAME' column in that series is the 'HEADER' I want.在“SCORE”列中总是有一系列 3 个 NaN 值，其中该系列中“NAME”列中的第一个值是我想要的“HEADER”。 So I'm thinking something about that relationship could be leveraged.所以我在考虑可以利用这种关系。 Current Pandas data frame has this structure:当前 Pandas 数据帧有这样的结构：

HEADER   NAME              SCORE
NaN      Header 1          NaN
NaN      Random Junk       NaN
NaN      Random Junk       NaN
NaN      Ed                98
NaN      Gary              78
NaN      Floyd             89
...      ...               ...
NaN      Header 2          NaN
NaN      Random Junk       NaN
NaN      Random Junk       NaN
NaN      Mary              96
NaN      Steve             78

and I want this:我想要这个：

HEADER        NAME              SCORE
Header 1      Header 1          NaN
Header 1      Random Junk       NaN
Header 1      Random Junk       NaN
Header 1      Ed                98
Header 1      Gary              78
Header 1      Floyd             89
...           ...               ...
Header 2      Header 2          NaN
Header 2      Random Junk       NaN
Header 2      Random Junk       NaN
Header 2      Mary              96
Header 2      Steve             78

so I can then remove the NaN rows and get what I'm truly after which is this:所以我可以删除 NaN 行并得到我真正想要的，这是：

HEADER        NAME              SCORE
Header 1      Ed                98
Header 1      Gary              78
Header 1      Floyd             89
...           ...               ...
Header 2      Mary              96
Header 2      Steve             78

After much searching, I'm not able to figure out how to do this conditional editing like this.经过大量搜索，我无法弄清楚如何进行这种条件编辑。 Would appreciate any help you can give.感谢您提供的任何帮助。

Answer 1

The header line occurs when SCORE has 3 nulls and 1 non-null in sequence, so: header 行出现在SCORE依次具有 3 个空值和 1 个非空值时，因此：

Check for this condition using shift , isna , and notna .使用shift 、 isna和notna检查这种情况。
mask the HEADER column as NAME when this condition is met.满足此条件时，将HEADER列mask为NAME 。
ffill (forward fill) the new HEADER . ffill （前向填充）新的HEADER 。
dropna based on the SCORE . dropna基于SCORE 。

is_header = df.SCORE.isna() & df.SCORE.shift(-1).isna() & df.SCORE.shift(-2).isna() & df.SCORE.shift(-3).notna()
df.HEADER = df.HEADER.mask(is_header, df.NAME).ffill()
df = df.dropna(subset=['SCORE'])

#       HEADER   NAME  SCORE
# 3   Header 1     Ed   98.0
# 4   Header 1   Gary   78.0
# 5   Header 1  Floyd   89.0
# 9   Header 2   Mary   96.0
# 10  Header 2  Steve   78.0

Pandas 基于非恒定值的第三列将值从一列复制到另一列

问题描述

1 个解决方案

解决方案1
1 2021-12-04 04:49:41

Pandas 基于非恒定值的第三列将值从一列复制到另一列

问题描述

1 个解决方案

解决方案1 1 2021-12-04 04:49:41

解决方案1
1 2021-12-04 04:49:41