[英]Replace cell values for each row and each column in Pandas using for loop
I'm using python with pandas, but can also import any other library Mu dataset has missing values (NaN) in thousands of rows in each column.我将 python 与 pandas 一起使用,但也可以导入任何其他库 Mu 数据集在每列的数千行中具有缺失值 (NaN)。
Examle例子
**Name,Type,Region...**
Oranges,Fruit,Western Europe
NaN,NaN,NaN
NaN,NaN,NaN
Blueberry, berry,Easter Europe
NaN,NaN,NaN
Raspberry, berry,Easter Europe
NaN,NaN,NaN
NaN,NaN,NaN
we can assume that the values in cells that have NaN can be re written to be the same as the previous value, until a new non NaN value is reached.我们可以假设具有 NaN 的单元格中的值可以重写为与先前的值相同,直到达到新的非 NaN 值。 Example:
例子:
**Name,Type,Region...**
Oranges,Fruit,Western Europe
Oranges,Fruit,Western Europe
Oranges,Fruit,Western Europe
Blueberry, berry,Easter Europe
Blueberry, berry,Easter Europe
Raspberry, berry,Easter Europe
Raspberry, berry,Easter Europe
Raspberry, berry,Easter Europe
How can I iterate over each row value and each column to re-write the NaN values to match the first Non NaN value before it?如何遍历每一行值和每一列以重写 NaN 值以匹配它之前的第一个非 NaN 值?
Rules: if cell = NaN and previous_cell = not NaN, replace value with previous_cell, if cell = NaN and previous_cell = NaN, continue (eliminating edge case when the whole column is empty) if cell = NaN, continue规则:如果 cell = NaN 且 previous_cell = not NaN,则将值替换为 previous_cell,如果 cell = NaN 且 previous_cell = NaN,则继续(消除整列为空时的边缘情况)如果 cell = NaN,则继续
I have a huge dataset, so this is not possible to do manually in the CSV file itself我有一个巨大的数据集,所以这不可能在 CSV 文件本身中手动完成
Nested query which does not work嵌套查询不起作用
you can use apply with ffill for all clomuns it avaliable in pandas:您可以将 apply 与 ffill 一起用于 pandas 中可用的所有 clomuns:
df.apply(lambda x: x.fillna(df['Name'].shift())).ffill()
df.apply(lambda x: x.fillna(df['Type'].shift())).ffill()
df.apply(lambda x: x.fillna(df['Region'].shift())).ffill()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.