如何從 R 中的多個文件中獲取列的平均值？

Question

我對 R 很陌生，這可能不是一個很難解決的問題，但我一直在四處走動，無法得到我需要的東西，所以如果有人能給我一些建議，我將不勝感激。 我以前也從未在其中一個論壇上問過問題，所以如果我沒有遵循所有正常的發帖約定，我深表歉意。

我有來自另一個程序的多個 output 文件，我正在嘗試使用 R 進行一些分析。 output文件的數量不會提前知道。 我將它們讀入我的 R 代碼並將它們存儲在變量 listFinal.data 中。

我正在嘗試遍歷 output 文件並按 Entity.Type 列中的不同值分組，計算每種不同實體類型的出現次數，然后我需要獲取每個實體類型的平均出現次數所有 output 文件。

這是我需要在 output 文件中使用的列的片段：

ID	實體類型
1	地面
2	地面
3	空氣
4	空氣
5	海
6	地面
7	海
8	地面
9	空氣
10	地面

我正在尋找這個單個文件的結果將是：

地面	空氣	海
5	3	2

我可以只為一個文件成功執行此操作，但是當我使用我編寫的代碼並且我有多個文件時，當我真正想要的是上面的單個結果時，我會為每個文件得到類似上面的結果，這是平均值跨所有文件。

這是我正在使用的代碼：

for (h in 1:length(listFinal.data)) #listFinal.data is all the output files from another program
  listVeh.data[[h]] <- listFinal.data[[h]] %>%
  filter(Entity.Type != "Lifeform") %>%  #remove people, just count vehicles
  group_by(Entity.Type) %>%
  summarize(n = n())

Answer 1

這是一個玩具示例，您已將 output 數據寫入列表：

set.seed(4)
d1 <- data.frame(ID = 1:30,
                 Entity.Type = sample(c("Ground", "Air", "Sea"), 30, replace=TRUE))
d2 <- data.frame(ID = 1:30,
                 Entity.Type = sample(c("Ground", "Air", "Sea"), 30, replace=TRUE))

datlist <- list(d1, d2)
names(datlist) <- c("d1", "d2")

我更喜歡ldply而不是do.call(rbind, lapply(...))因為它直接為命名列表添加數據的 id。

output <- plyr::ldply(datlist, function(x) x %>% group_by(Entity.Type) %>% summarise(n=n()))

  .id Entity.Type  n
1  d1         Air  9
2  d1      Ground  9
3  d1         Sea 12
4  d2         Air 14
5  d2      Ground  9
6  d2         Sea  7

計算整個列表中的平均值將很簡單。

output %>% group_by(Entity.Type) %>% summarise(mean(n))

# A tibble: 3 x 2
  Entity.Type `mean(n)`
  <chr>           <dbl>
1 Air              11.5
2 Ground            9  
3 Sea               9.5

如何從 R 中的多個文件中獲取列的平均值？

問題描述

1 個解決方案

解決方案1
1 已采納 2022-07-01 04:52:20

如何從 R 中的多個文件中獲取列的平均值？

問題描述

1 個解決方案

解決方案1 1 已采納 2022-07-01 04:52:20

解決方案1
1 已采納 2022-07-01 04:52:20