简体   繁体   English

将一个数据帧中的零值列替换为另一个数据帧中的同名列的平均值

[英]Replace zero valued columns in one data frame with mean values of same name column in another data frame

I have two data frames, df1 and df2, each with same number of columns & same column names, but with different number of rows. 我有两个数据框,df1和df2,每个数据框具有相同的列数和相同的列名,但具有不同的行数。 Basically, there are many columns in df2 which have all 0 values. 基本上,df2中有许多列都具有0值。

What I would like to accomplish is that all columns in df2 which are zero valued are replaced with the mean (average) value of the same column name (as in df1). 我想要完成的是df2中所有零值的列都被替换为相同列名的平均值(如df1中所示)。

So, if df1 has a structure like:- 所以,如果df1有这样的结构: -

Column1 Column2 ------    Column n
0.4      2.3               1.7
0.7      2.5               1.4
0.1      2.1               1.2

and df2 has a structure like:- 和df2有如下结构: -

Column1 Column2 ------    Column n
0      2.3                1.7
0      2.5               1.4
0      2.1               1.2

I would like to replace column1 (and any other all-zero columns in df2) with the mean of the same column mapped in df1. 我想用df1中映射的相同列的平均值替换column1(以及df2中的任何其他全零列)。 So, finally, df2 would look like:- 所以,最后,df2看起来像: -

Column1 Column2 ------    Column n
0.4      2.3               1.7
0.4      2.5               1.4
0.4      2.1               1.2

(All zero values in column 1 of df2 replaced with mean of column 1 in df1. (df2第1列中的所有零值均替换为df1中第1列的均值。

I am fairly new to this and have checked other options such as fillna() and replace(), but am unable to accomplish exactly what I want. 我是相当新的,并检查了其他选项,如fillna()和replace(),但我无法完成我想要的。 Any help in this regard is highly appreciated. 在这方面的任何帮助都非常感谢。

Use DataFrame.mask with mean : 使用DataFrame.mask mean

df = df2.mask(df2 == 0, df1.mean(), axis=1)
print (df)
   Column1  Column2  Column n
0      0.4      2.3       1.7
1      0.4      2.5       1.4
2      0.4      2.1       1.2

numpy alternative with numpy.where should working faster in large DataFrames: 使用numpy.where numpy替代numpy.where应该在大型DataFrame中更快地工作:

df = pd.DataFrame(np.where(df2 == 0, df1.mean(), df1), 
                  index=df1.index,
                  columns=df1.columns)
print (df)
   Column1  Column2  Column n
0      0.4      2.3       1.7
1      0.4      2.5       1.4
2      0.4      2.1       1.2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 大熊猫:根据索引和列将一个数据框的值替换为另一数据框的值 - Pandas: replace values of one data frame with values of another data frame based on index and column 用另一个数据框中的值更新一个数据框中的列 - Update columns in one data frame with values from another data frame 用基于其中一列的另一个数据框替换数据框中的值 - Replacing values in data frame with another data frame based on one of the columns 我需要基于列名从一个数据框到另一个数据框的值在 python pandas 中 - I need values from one data frame to another data frame in python pandas based on column name Pandas 根据同一数据框中另一列的条件替换列值 - Pandas Replace column values based on condition upon another column in the same data frame Python Pandas:将具有列名的数据框列合并为一列 - Python Pandas: Merge Columns of Data Frame with column name into one column Python:根据列名和条件替换数据框值 - Python: replace data frame values based on column name and conditional 用同一列中相邻行的平均值替换数据框中的零 - Replace zeros in the data frame with average values of adjacent rows in the same column 使用Pandas数据框将值从一列分配给变量,而将另一变量用作列名 - Using Pandas data frame assign values from one column to a variable using another variable for the column name 检查一个数据帧的任何值(多列)是否在另一数据帧的任何值(多列)中 - Check if any value ( multiple columns) of one data frame exists in any values (multiple columns) of another data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM