[英]Pandas: Replace value of column with previous row value if condition is met
I have a DataFrame imported from a txt file with the following structure:我有一个从具有以下结构的 txt 文件导入的 DataFrame:
ID Place Name Other
0 123456789 1100 NAME1 5468.85
1 NUMBER1 1100 DESCRIPTION1
2 STR1 DESCRIPTION2
3 NUMBER2 OTHER_STR
4 987654321 1100 NAME2 4566.65
1 NUMBER1 1100 DESCRIPTION1
2 STR1 DESCRIPTION2
I want to check something like the code below, but I understand that iterating through a df is a bad practice, and I'm not an expert in Pandas:我想检查类似下面的代码,但我知道迭代 df 是一种不好的做法,而且我不是 Pandas 的专家:
for row in df:
if row['Other'] == '' or row['Place'] == '':
row['ID'] == previous_row['ID']
The output should look like this: output 应如下所示:
ID Place Name Other
0 123456789 1100 NAME1 5468.85
1 123456789 1100 DESCRIPTION1
2 123456789 DESCRIPTION2
3 123456789 OTHER_STR
4 987654321 1100 NAME2 4566.65
1 987654321 1100 DESCRIPTION1
2 987654321 DESCRIPTION2
Note that any row can be either a STR, an INT or blank.请注意,任何行都可以是 STR、INT 或空白。 The data set is a bit more than a million rows by 15 columns, so it needs to be fast.
数据集略多于一百万行乘 15 列,因此需要快速。
I've tried what's suggested here , but it doesn't quite determines a condition for the value of a column to be updated.我已经尝试过这里的建议,但它并不能完全确定要更新列值的条件。
Using pandas.Series.ffill
:使用
pandas.Series.ffill
:
s = df["Place"].eq("") | df["Other"].eq("")
df.loc[s, "ID"] = pd.np.nan
df["ID"].ffill(inplace=True)
print(df)
Output: Output:
ID Place Name Other
0 123456789 1100 NAME1 5468.85
1 123456789 1100 DESCRIPTION1
2 123456789 DESCRIPTION2
3 123456789 OTHER_STR
4 987654321 1100 NAME2 4566.65
1 987654321 1100 DESCRIPTION1
2 987654321 DESCRIPTION2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.