简体   繁体   English

Python/Pandas:如果值为 NaN 或 0,则用同一行内下一列的值填充

[英]Python/Pandas: if value is NaN or 0 then fill with the value from the next column within the same row

I have gone through several posts and they either only apply to examples with one column, or with only NaN or 0 values - but not both.我浏览了几篇文章,它们要么仅适用于一列的示例,要么仅适用于 NaN 或 0 值——但不能同时适用于两者。

My df looks like this.我的 df 看起来像这样。 I would like to fill-in column 'Main' with the non-missing or non-zero string found in the four columns right to it.我想用在它右边的四列中找到的非缺失或非零字符串填写“Main”列。

current df =当前 df =

import pandas as pd

d = {'Main': ['','','',''], 'col2': ['Big','','',0], 'col3': [0,'Medium',0,''], 'col4': ['','','Small',''], 'col5':['',0,'','Vsmall']}
df = pd.DataFrame(data=d)

+------+------+--------+-------+--------+
| Main | Col2 | Col3   | Col4  | Col5   |
+------+------+--------+-------+--------+
|      | Big  | 0      | ...   |        |
+------+------+--------+-------+--------+
|      | ...  | Medium | ...   | 0      |
+------+------+--------+-------+--------+
|      |      | 0      | Small |        |
+------+------+--------+-------+--------+
|      | 0    | ...    | ...   | Vsmall |
+------+------+--------+-------+--------+

desired output df所需的输出 df

+--------+------+--------+-------+--------+
| Main   | Col2 | Col3   | Col4  | Col5   |
+--------+------+--------+-------+--------+
| Big    | Big  | 0      | ...   |        |
+--------+------+--------+-------+--------+
| Medium | ...  | Medium | ...   | 0      |
+--------+------+--------+-------+--------+
| Small  |      | 0      | Small |        |
+--------+------+--------+-------+--------+
| Vsmall | 0    | ...    | ...   | Vsmall |
+--------+------+--------+-------+--------+

Thanks in advance!提前致谢!

Idea is replace 0 and empty strings to missing values by DataFrame.mask , then back filling missing rows and last select first column:想法是通过DataFrame.mask0和空字符串替换为缺失值,然后回填缺失的行并最后选择第一列:

c = ['col2','col3','col4','col5']
df['Main'] = df[c].mask(df.isin(['0','',0])).bfill(axis=1).iloc[:, 0]
print (df)
     Main col1    col2   col3
0     Big  Big    None       
1  Medium    0  Medium   None
2   Small            0  Small

If possible create list of all possible extracted strings replace all another values by DataFrame.where :如果可能,创建所有可能提取的字符串的列表,用DataFrame.where替换所有其他值:

['col2','col3','col4','col5']
df['Main'] = df[c].where(df.isin(['Big','Medium','Small','Vsmall'])).bfill(axis=1).iloc[:,0]
print (df)
     Main col1    col2   col3
0     Big  Big    None       
1  Medium    0  Medium   None
2   Small            0  Small

Details :详情

print (df[c].mask(df.isin(['0','',0])))
#print (df[c].where(df.isin(['Big','Medium','Small','Vsmall'])))

   col1    col2   col3
0  Big    None    NaN
1  NaN  Medium   None
2  NaN     NaN  Small

print (df[c].mask(df.isin(['0','',0])).bfill(axis=1))
     col1    col2   col3
0     Big     NaN    NaN
1  Medium  Medium   None
2   Small   Small  Small

From sample data presented by you, I think what you are trying to achieve is decoding one-hot encoded data (a classic technique for converting categorical data to numerical data in Machine Learning).从您提供的示例数据中,我认为您想要实现的是解码单热编码数据(一种在机器学习中将分类数据转换为数值数据的经典技术)。

Here is code to achieve decoding:下面是实现解码的代码:

import pandas as pd

d = {'Main': [0,0,0,0], 'col2': ['Big','','',0], 'col3': [0,'Medium',0,''], 'col4': ['','','Small',''], 'col5':['',0,'','Vsmall']}
df = pd.DataFrame(data=d)

def reduce_function(row):
    for col in ['col2','col3','col4','col5']:
        if not pd.isnull(row[col]) and row[col] != 0 and row[col] != '':
            return row[col]

df['Main']=df.apply(reduce_function, axis=1)

Note : Always consider, using reductions (ie apply() ) on dataframes than iterating over rows.注意:始终考虑,在数据帧上使用缩减(即apply() )而不是遍历行。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用 Python 中同一列中的下一个值填充 NaN 值 - Fill NaN values with next value in same column in Python 使用pandas在csv文件的同一行上填充下一列值的行中的空值 - Fill empty values from a row with the value of next column on the same row on csv file with pandas Python Pandas 将一列中的 NaN 替换为与列表列相同行的另一列中的值 - Python Pandas replace NaN in one column with value from another column of the same row it has be as list column 如何从另一列及以上行中的值填充 pandas dataframe 中的 nan 值? - How to fill nan value in pandas dataframe from value in another column and above row? Pandas:同列不同行如何填写值 - Pandas:How to fill in value from the same column but different row 用前一行和下一行的平均值填充 NaN 值 - Python - Fill NaN value with the mean of the previous and the next row - Python 熊猫用同一列中的下一个可用值填充列值 - Pandas Fill column value with next available value in the same column 如何用同一组的最小值填充同一列中的NaN值 - How to fill in NaN values within the same column, with a minimal value of a group Python pandas 根据另一列的条件填充缺失值(NaN) - Python pandas fill missing value (NaN) based on condition of another column 根据Pandas中第二列的条件,用另一行的同一列的值填充特定行的列中的值 - Fill values in a column of a particular row with the value of same column from another row based on a condition on second column in Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM