[英]How to compute the NAs with the column mean and then multiply columns of different lengths in R?
My question might be not so clear so I am putting an example.我的问题可能不太清楚,所以我举个例子。
My final goal is to produce我的最终目标是制作
final=(df1$a*df2$b)+(df1$a*df3$c*df4$d)+(df4$d*df5$e)
I have five data frames (one column each) with different lengths as follows:我有五个不同长度的数据框(每个一列),如下所示:
df1
a
1. 1
2. 2
3. 4
4. 2
df2
b
1. 2
2. 6
df3
c
1. 2
2. 4
3. 3
df4
d
1. 1
2. 2
3. 4
4. 3
df5
e
1. 4
2. 6
3. 2
So I want a final database which includes them all as follows所以我想要一个包含它们的最终数据库,如下所示
finaldf
a b c d e
1. 1 2 2 1 4
2. 2 6 4 2 6
3. 4 NA 3 4 2
4. 2 NA NA 3 NA
I want all the NAs for each column to be replaced with the mean of that column, so the finaldf
has equal length of all the columns:我希望每列的所有 NA 都替换为该列的平均值,因此
finaldf
具有所有列的相等长度:
finaldf
a b c d e
1. 1 2 2 1 4
2. 2 6 4 2 6
3. 4 4 3 4 2
4. 2 4 3 3 4
and therefore I can produce a final result for final=(df1$a*df2$b)+(df1$a*df3$c*df4$d)+(df4$d*df5$e) as I need.
因此我可以根据需要为
final=(df1$a*df2$b)+(df1$a*df3$c*df4$d)+(df4$d*df5$e) as I need.
The easiest by far is to use the qpcR, dplyr and tidyr packages.到目前为止最简单的是使用 qpcR、dplyr 和 tidyr 包。
library(dplyr)
library(qpcR)
library(tidyr)
df1 <- data.frame(a=c(1,2,4,2))
df2 <- data.frame(b=c(2,6))
df3 <- data.frame(c=c(2,4,3))
df4 <- data.frame(d=c(1,2,4,3))
df5 <- data.frame(e=c(4,6,2))
mydf <- qpcR:::cbind.na(df1, df2, df3, df4,df5) %>%
tidyr::replace_na(.,as.list(colMeans(.,na.rm=T)))
> mydf
a b c d e
1 1 2 2 1 4
2 2 6 4 2 6
3 4 4 3 4 2
4 2 4 3 3 4
Depending on your rgl, settings, you might need to run the following at the top of your script to make the qpcR
package load (see https://stackoverflow.com/a/66127391/2554330 ):根据您的 rgl 设置,您可能需要在脚本顶部运行以下命令以使
qpcR
package 加载(请参阅https://stackoverflow.com/a/66127391/2554330 ):
options(rgl.useNULL = TRUE)
library(rgl)
With purrr and dplyr, we can first put all dataframes in a list with mget().使用 purrr 和 dplyr,我们可以首先使用 mget() 将所有数据帧放入一个列表中。 Second, use
set_names
to replace the dataframe names with their respective column names.其次,使用
set_names
将 dataframe 名称替换为其各自的列名称。 As a third step, unlist the dataframes to get vectors with pluck
.第三步,使用
pluck
取消列出数据帧以获取向量。 Then add the NAs by making all vectors the same length
.然后通过使所有向量的
length
相同来添加 NA。 Finally, bind all vectors back into a dataframe with as.data.frame
, then use mutate
with ~replace_na and colmeans.最后,使用
as.data.frame
将所有向量绑定回 dataframe,然后使用mutate
和 ~replace_na 和 colmeans。
mget(ls(pattern = 'df\\d')) %>%
set_names(map_chr(., colnames)) %>%
map(pluck, 1) %>%
map(., `length<-`, max(lengths(.))) %>%
as.data.frame %>%
mutate(across(everything(), ~replace_na(.x, mean(.x, na.rm=TRUE))))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.