简体   繁体   English

如何用列均值计算 NA,然后将 R 中不同长度的列相乘?

[英]How to compute the NAs with the column mean and then multiply columns of different lengths in R?

My question might be not so clear so I am putting an example.我的问题可能不太清楚,所以我举个例子。

My final goal is to produce我的最终目标是制作

final=(df1$a*df2$b)+(df1$a*df3$c*df4$d)+(df4$d*df5$e)

I have five data frames (one column each) with different lengths as follows:我有五个不同长度的数据框(每个一列),如下所示:

df1

    a
1.  1
2.  2
3.  4
4.  2

df2

    b
1.  2
2.  6

df3

    c
1.  2
2.  4 
3.  3

df4

    d
1.  1
2.  2
3.  4
4.  3

df5

    e
1.  4
2.  6
3.  2

So I want a final database which includes them all as follows所以我想要一个包含它们的最终数据库,如下所示

finaldf

    a   b   c   d  e
1.  1   2   2   1  4
2.  2   6   4   2  6
3.  4   NA  3   4  2
4.  2   NA  NA  3  NA

I want all the NAs for each column to be replaced with the mean of that column, so the finaldf has equal length of all the columns:我希望每列的所有 NA 都替换为该列的平均值,因此finaldf具有所有列的相等长度:

finaldf

    a   b   c   d   e
1.  1   2   2   1   4
2.  2   6   4   2   6
3.  4   4   3   4   2
4.  2   4   3   3   4

and therefore I can produce a final result for final=(df1$a*df2$b)+(df1$a*df3$c*df4$d)+(df4$d*df5$e) as I need.因此我可以根据需要为final=(df1$a*df2$b)+(df1$a*df3$c*df4$d)+(df4$d*df5$e) as I need.

The easiest by far is to use the qpcR, dplyr and tidyr packages.到目前为止最简单的是使用 qpcR、dplyr 和 tidyr 包。

library(dplyr)
library(qpcR)
library(tidyr)

df1 <- data.frame(a=c(1,2,4,2))
df2 <- data.frame(b=c(2,6))
df3 <- data.frame(c=c(2,4,3))
df4 <- data.frame(d=c(1,2,4,3))
df5 <- data.frame(e=c(4,6,2))

mydf <- qpcR:::cbind.na(df1, df2, df3, df4,df5) %>% 
  tidyr::replace_na(.,as.list(colMeans(.,na.rm=T)))

> mydf
  a b c d e
1 1 2 2 1 4
2 2 6 4 2 6
3 4 4 3 4 2
4 2 4 3 3 4

Depending on your rgl, settings, you might need to run the following at the top of your script to make the qpcR package load (see https://stackoverflow.com/a/66127391/2554330 ):根据您的 rgl 设置,您可能需要在脚本顶部运行以下命令以使qpcR package 加载(请参阅https://stackoverflow.com/a/66127391/2554330 ):

options(rgl.useNULL = TRUE)
library(rgl)

With purrr and dplyr, we can first put all dataframes in a list with mget().使用 purrr 和 dplyr,我们可以首先使用 mget() 将所有数据帧放入一个列表中。 Second, use set_names to replace the dataframe names with their respective column names.其次,使用set_names将 dataframe 名称替换为其各自的列名称。 As a third step, unlist the dataframes to get vectors with pluck .第三步,使用pluck取消列出数据帧以获取向量。 Then add the NAs by making all vectors the same length .然后通过使所有向量的length相同来添加 NA。 Finally, bind all vectors back into a dataframe with as.data.frame , then use mutate with ~replace_na and colmeans.最后,使用as.data.frame将所有向量绑定回 dataframe,然后使用mutate和 ~replace_na 和 colmeans。

mget(ls(pattern = 'df\\d')) %>%
        set_names(map_chr(., colnames)) %>%
        map(pluck, 1) %>%
        map(., `length<-`, max(lengths(.))) %>%
        as.data.frame %>%
        mutate(across(everything(), ~replace_na(.x, mean(.x, na.rm=TRUE))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM