[英]Pandas copy value from one column to another based on a value third column that is not constant
I have a large dataset that is one huge table that actually should be many tables.我有一个大型数据集,它是一个巨大的表,实际上应该是许多表。 The headers are buried in the rows for the subsets.
标题隐藏在子集的行中。 My goal is to pull those headers out into a new column such that I can filter by that column to get the data I want (one header at a time).
我的目标是将这些标题拉到一个新列中,以便我可以按该列过滤以获得我想要的数据(一次一个 header)。 I've created an empty header column for this.
我为此创建了一个空的 header 列。 There is always a series of 3 NaN values in the 'SCORE' column where the first value in the 'NAME' column in that series is the 'HEADER' I want.
在“SCORE”列中总是有一系列 3 个 NaN 值,其中该系列中“NAME”列中的第一个值是我想要的“HEADER”。 So I'm thinking something about that relationship could be leveraged.
所以我在考虑可以利用这种关系。 Current Pandas data frame has this structure:
当前 Pandas 数据帧有这样的结构:
HEADER NAME SCORE
NaN Header 1 NaN
NaN Random Junk NaN
NaN Random Junk NaN
NaN Ed 98
NaN Gary 78
NaN Floyd 89
... ... ...
NaN Header 2 NaN
NaN Random Junk NaN
NaN Random Junk NaN
NaN Mary 96
NaN Steve 78
and I want this:我想要这个:
HEADER NAME SCORE
Header 1 Header 1 NaN
Header 1 Random Junk NaN
Header 1 Random Junk NaN
Header 1 Ed 98
Header 1 Gary 78
Header 1 Floyd 89
... ... ...
Header 2 Header 2 NaN
Header 2 Random Junk NaN
Header 2 Random Junk NaN
Header 2 Mary 96
Header 2 Steve 78
so I can then remove the NaN rows and get what I'm truly after which is this:所以我可以删除 NaN 行并得到我真正想要的,这是:
HEADER NAME SCORE
Header 1 Ed 98
Header 1 Gary 78
Header 1 Floyd 89
... ... ...
Header 2 Mary 96
Header 2 Steve 78
After much searching, I'm not able to figure out how to do this conditional editing like this.经过大量搜索,我无法弄清楚如何进行这种条件编辑。 Would appreciate any help you can give.
感谢您提供的任何帮助。
The header line occurs when SCORE
has 3 nulls and 1 non-null in sequence, so: header 行出现在
SCORE
依次具有 3 个空值和 1 个非空值时,因此:
shift
, isna
, and notna
.shift
、 isna
和notna
检查这种情况。mask
the HEADER
column as NAME
when this condition is met.HEADER
列mask
为NAME
。ffill
(forward fill) the new HEADER
. ffill
(前向填充)新的HEADER
。dropna
based on the SCORE
. dropna
基于SCORE
。is_header = df.SCORE.isna() & df.SCORE.shift(-1).isna() & df.SCORE.shift(-2).isna() & df.SCORE.shift(-3).notna()
df.HEADER = df.HEADER.mask(is_header, df.NAME).ffill()
df = df.dropna(subset=['SCORE'])
# HEADER NAME SCORE
# 3 Header 1 Ed 98.0
# 4 Header 1 Gary 78.0
# 5 Header 1 Floyd 89.0
# 9 Header 2 Mary 96.0
# 10 Header 2 Steve 78.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.