简体   繁体   English

根据 R 中 row.name() 的第一部分计算平均值

[英]Calculate mean based on first part of row.name() in R

I have a data frame that looks likes this:我有一个看起来像这样的数据框:

structure(list(value1 = c(1, 2, 3, 4, 5), value2 = c(1, 2, 2, 
2, 2), value3 = c(1, 1, 2, 3, 4)), class = "data.frame", row.names = c("apple1", 
"apple2", "orange1", "orange2", "plum"))
value1价值1 value2价值2 value3价值3
apple1苹果1 1 1 1 1 1 1
apple2苹果2 2 2 2 2 1 1
orange1橙色1 3 3 2 2 2 2
orange2橙色2 4 4 2 2 3 3
plum李子 5 5 2 2 4 4

now I want to run the mean function on every column based on the first part of the row names (for example I want to calculate the mean of value1 of the apple group independently from their apple number.) I figured out that something like this works:现在我想根据行名的第一部分在每一列上运行平均值 function (例如,我想独立于他们的苹果编号计算苹果组的 value1 的平均值。)我发现像这样的东西有效:

 y<-x[grep("apple",row.names(x)),]
    mean(y$value1)    
    mean(y$value2)
    mean(y$vvalue3)
 y<-x[grep("orange",row.names(x)),]
    mean(y$value1)    
    mean(y$value2)
    mean(y$value2) 
 y<-x[grep("plum",row.names(x)),]
    mean(y$value1)    
    mean(y$value2)
    mean(y$value2) 

but for a bigger dataset, this is going to take ages, so I was wondering if there is a more efficient way to subset the data based on the first part of the row name and calculating the mean afterward.但是对于更大的数据集,这将需要很长时间,所以我想知道是否有一种更有效的方法可以根据行名的第一部分对数据进行子集化,然后计算平均值。

Using tidyverse :使用tidyverse

library(tidyverse)

df %>% 
  tibble::rownames_to_column("row") %>% 
  dplyr::mutate(row = str_remove(row, "\\d+")) %>% 
  dplyr::group_by(row) %>% 
  dplyr::summarize(across(where(is.numeric), ~ mean(.), .groups = "drop"))

In base R you could do:在基础R ,您可以执行以下操作:

df$row <- gsub("\\d+", "", rownames(df))
data.frame(do.call(cbind, lapply(df[,1:3], function(x) by(x, df$row, mean))))

Output Output

  row    value1 value2 value3
* <chr>   <dbl>  <dbl>  <dbl>
1 apple     1.5    1.5    1  
2 orange    3.5    2      2.5
3 plum      5      2      4  

Data数据

df <- structure(list(value1 = 1:5, value2 = c(1, 2, 2, 2, 2), value3 = c(1, 
1, 2, 3, 4)), class = "data.frame", row.names = c("apple1", "apple2", 
"orange1", "orange2", "plum"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM