[英]Replace zero valued columns in one data frame with mean values of same name column in another data frame
I have two data frames, df1 and df2, each with same number of columns & same column names, but with different number of rows. 我有两个数据框,df1和df2,每个数据框具有相同的列数和相同的列名,但具有不同的行数。 Basically, there are many columns in df2 which have all 0 values.
基本上,df2中有许多列都具有0值。
What I would like to accomplish is that all columns in df2 which are zero valued are replaced with the mean (average) value of the same column name (as in df1). 我想要完成的是df2中所有零值的列都被替换为相同列名的平均值(如df1中所示)。
So, if df1 has a structure like:- 所以,如果df1有这样的结构: -
Column1 Column2 ------ Column n
0.4 2.3 1.7
0.7 2.5 1.4
0.1 2.1 1.2
and df2 has a structure like:- 和df2有如下结构: -
Column1 Column2 ------ Column n
0 2.3 1.7
0 2.5 1.4
0 2.1 1.2
I would like to replace column1 (and any other all-zero columns in df2) with the mean of the same column mapped in df1. 我想用df1中映射的相同列的平均值替换column1(以及df2中的任何其他全零列)。 So, finally, df2 would look like:-
所以,最后,df2看起来像: -
Column1 Column2 ------ Column n
0.4 2.3 1.7
0.4 2.5 1.4
0.4 2.1 1.2
(All zero values in column 1 of df2 replaced with mean of column 1 in df1. (df2第1列中的所有零值均替换为df1中第1列的均值。
I am fairly new to this and have checked other options such as fillna() and replace(), but am unable to accomplish exactly what I want. 我是相当新的,并检查了其他选项,如fillna()和replace(),但我无法完成我想要的。 Any help in this regard is highly appreciated.
在这方面的任何帮助都非常感谢。
Use DataFrame.mask
with mean
: 使用
DataFrame.mask
mean
:
df = df2.mask(df2 == 0, df1.mean(), axis=1)
print (df)
Column1 Column2 Column n
0 0.4 2.3 1.7
1 0.4 2.5 1.4
2 0.4 2.1 1.2
numpy
alternative with numpy.where
should working faster in large DataFrames: 使用
numpy.where
numpy
替代numpy.where
应该在大型DataFrame中更快地工作:
df = pd.DataFrame(np.where(df2 == 0, df1.mean(), df1),
index=df1.index,
columns=df1.columns)
print (df)
Column1 Column2 Column n
0 0.4 2.3 1.7
1 0.4 2.5 1.4
2 0.4 2.1 1.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.