简体   繁体   English

如何从 R 中的多个文件中获取列的平均值?

[英]How can I get the average for a column from multiple files in R?

I am very new to R and this is probably not a difficult problem to solve, but I have been going around and around and can't get what I need, so I would be very grateful if someone could give me some advice.我对 R 很陌生,这可能不是一个很难解决的问题,但我一直在四处走动,无法得到我需要的东西,所以如果有人能给我一些建议,我将不胜感激。 I've also never asked a question on one of these forums before, so I apologize if I am not following all of the normal conventions for posting.我以前也从未在其中一个论坛上问过问题,所以如果我没有遵循所有正常的发帖约定,我深表歉意。

I have multiple output files from another program that I am trying to do some analysis with using R.我有来自另一个程序的多个 output 文件,我正在尝试使用 R 进行一些分析。 The number of output files will not be known in advance. output文件的数量不会提前知道。 I read them into my R code and store them in the variable listFinal.data.我将它们读入我的 R 代码并将它们存储在变量 listFinal.data 中。

I am trying to loop through the output files and group by the different values in the column Entity.Type, count the number of occurrences for each of the different entity types and then I need to get the average number of occurrences for each entity type across all of the output files.我正在尝试遍历 output 文件并按 Entity.Type 列中的不同值分组,计算每种不同实体类型的出现次数,然后我需要获取每个实体类型的平均出现次数所有 output 文件。

Here is a snippet of the column I need to work with in the output files:这是我需要在 output 文件中使用的列的片段:

ID ID Entity.Type实体类型
1 1 Ground地面
2 2 Ground地面
3 3 Air空气
4 4 Air空气
5 5 Sea
6 6 Ground地面
7 7 Sea
8 8 Ground地面
9 9 Air空气
10 10 Ground地面

Results I am looking for for this single file would be:我正在寻找这个单个文件的结果将是:

Ground地面 Air空气 Sea
5 5 3 3 2 2

I can do this successfully for just one file, but when I use the code that I have written and I have multiple files, I get a result like above for each file when what I really want is a single result like above that is the average across all files.我可以只为一个文件成功执行此操作,但是当我使用我编写的代码并且我有多个文件时,当我真正想要的是上面的单个结果时,我会为每个文件得到类似上面的结果,这是平均值跨所有文件。

Here is the code that I am using:这是我正在使用的代码:

for (h in 1:length(listFinal.data)) #listFinal.data is all the output files from another program
  listVeh.data[[h]] <- listFinal.data[[h]] %>%
  filter(Entity.Type != "Lifeform") %>%  #remove people, just count vehicles
  group_by(Entity.Type) %>%
  summarize(n = n()) 

Here's a toy example, where you have written the output data as a list:这是一个玩具示例,您已将 output 数据写入列表:

set.seed(4)
d1 <- data.frame(ID = 1:30,
                 Entity.Type = sample(c("Ground", "Air", "Sea"), 30, replace=TRUE))
d2 <- data.frame(ID = 1:30,
                 Entity.Type = sample(c("Ground", "Air", "Sea"), 30, replace=TRUE))

datlist <- list(d1, d2)
names(datlist) <- c("d1", "d2")

I prefer ldply over do.call(rbind, lapply(...)) as it adds the id of the data directly for named list.我更喜欢ldply而不是do.call(rbind, lapply(...))因为它直接为命名列表添加数据的 id。

output <- plyr::ldply(datlist, function(x) x %>% group_by(Entity.Type) %>% summarise(n=n()))

  .id Entity.Type  n
1  d1         Air  9
2  d1      Ground  9
3  d1         Sea 12
4  d2         Air 14
5  d2      Ground  9
6  d2         Sea  7

Calculating the mean value in the whole list will be straightforward.计算整个列表中的平均值将很简单。

output %>% group_by(Entity.Type) %>% summarise(mean(n))

# A tibble: 3 x 2
  Entity.Type `mean(n)`
  <chr>           <dbl>
1 Air              11.5
2 Ground            9  
3 Sea               9.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM