[英]Python/Pandas: if value is NaN or 0 then fill with the value from the next column within the same row
I have gone through several posts and they either only apply to examples with one column, or with only NaN or 0 values - but not both.我浏览了几篇文章,它们要么仅适用于一列的示例,要么仅适用于 NaN 或 0 值——但不能同时适用于两者。
My df looks like this.我的 df 看起来像这样。 I would like to fill-in column 'Main' with the non-missing or non-zero string found in the four columns right to it.
我想用在它右边的四列中找到的非缺失或非零字符串填写“Main”列。
current df =当前 df =
import pandas as pd
d = {'Main': ['','','',''], 'col2': ['Big','','',0], 'col3': [0,'Medium',0,''], 'col4': ['','','Small',''], 'col5':['',0,'','Vsmall']}
df = pd.DataFrame(data=d)
+------+------+--------+-------+--------+
| Main | Col2 | Col3 | Col4 | Col5 |
+------+------+--------+-------+--------+
| | Big | 0 | ... | |
+------+------+--------+-------+--------+
| | ... | Medium | ... | 0 |
+------+------+--------+-------+--------+
| | | 0 | Small | |
+------+------+--------+-------+--------+
| | 0 | ... | ... | Vsmall |
+------+------+--------+-------+--------+
desired output df所需的输出 df
+--------+------+--------+-------+--------+
| Main | Col2 | Col3 | Col4 | Col5 |
+--------+------+--------+-------+--------+
| Big | Big | 0 | ... | |
+--------+------+--------+-------+--------+
| Medium | ... | Medium | ... | 0 |
+--------+------+--------+-------+--------+
| Small | | 0 | Small | |
+--------+------+--------+-------+--------+
| Vsmall | 0 | ... | ... | Vsmall |
+--------+------+--------+-------+--------+
Thanks in advance!提前致谢!
Idea is replace 0
and empty strings to missing values by DataFrame.mask
, then back filling missing rows and last select first column:想法是通过
DataFrame.mask
将0
和空字符串替换为缺失值,然后回填缺失的行并最后选择第一列:
c = ['col2','col3','col4','col5']
df['Main'] = df[c].mask(df.isin(['0','',0])).bfill(axis=1).iloc[:, 0]
print (df)
Main col1 col2 col3
0 Big Big None
1 Medium 0 Medium None
2 Small 0 Small
If possible create list of all possible extracted strings replace all another values by DataFrame.where
:如果可能,创建所有可能提取的字符串的列表,用
DataFrame.where
替换所有其他值:
['col2','col3','col4','col5']
df['Main'] = df[c].where(df.isin(['Big','Medium','Small','Vsmall'])).bfill(axis=1).iloc[:,0]
print (df)
Main col1 col2 col3
0 Big Big None
1 Medium 0 Medium None
2 Small 0 Small
Details :详情:
print (df[c].mask(df.isin(['0','',0])))
#print (df[c].where(df.isin(['Big','Medium','Small','Vsmall'])))
col1 col2 col3
0 Big None NaN
1 NaN Medium None
2 NaN NaN Small
print (df[c].mask(df.isin(['0','',0])).bfill(axis=1))
col1 col2 col3
0 Big NaN NaN
1 Medium Medium None
2 Small Small Small
From sample data presented by you, I think what you are trying to achieve is decoding one-hot encoded data (a classic technique for converting categorical data to numerical data in Machine Learning).从您提供的示例数据中,我认为您想要实现的是解码单热编码数据(一种在机器学习中将分类数据转换为数值数据的经典技术)。
Here is code to achieve decoding:下面是实现解码的代码:
import pandas as pd
d = {'Main': [0,0,0,0], 'col2': ['Big','','',0], 'col3': [0,'Medium',0,''], 'col4': ['','','Small',''], 'col5':['',0,'','Vsmall']}
df = pd.DataFrame(data=d)
def reduce_function(row):
for col in ['col2','col3','col4','col5']:
if not pd.isnull(row[col]) and row[col] != 0 and row[col] != '':
return row[col]
df['Main']=df.apply(reduce_function, axis=1)
Note : Always consider, using reductions (ie apply()
) on dataframes than iterating over rows.注意:始终考虑,在数据帧上使用缩减(即
apply()
)而不是遍历行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.