R：函數內部的 if 語句（lapply）

Question

我有大量數據框列表，其中包含來自不同地區的環境變量。 對於列表中的每個數據框，我想匯總跨地區的值（= 將同一地區的測量值組合為一個），使用數據框的名稱作為需要匯總變量的條件。 例如，對於名稱為“鹽度”的數據框，我只想總結鹽度，而不是其他環境變量。 請注意，不同的數據幀包含來自不同地區的數據，因此我不能簡單地將它們合並到一個數據幀中。

讓我們用一個虛擬數據集來做這件事：

#create list of dataframes
df1 = data.frame(locality = c(1, 2, 2, 5, 7, 7, 9),
                     Temp = c(14, 15, 16, 18, 20, 18, 21),
                     Sal = c(16, NA, NA, 12, NA, NA, 9))

df2 = data.frame(locality = c(1, 1, 3, 6, 8, 9, 9),
                 Temp = c(1, 2, 4, 5, 0, 2, -1),
                 Sal = c(18, NA, NA, NA, 36, NA, NA))

df3 = data.frame(locality = c(1, 3, 4, 4, 5, 5, 9),
                 Temp = c(14, NA, NA, NA, 17, 18, 21),
                 Sal = c(16, 8, 24, 23, 11, 12, 9))

df4 = data.frame(locality = c(1, 1, 1, 4, 7, 8, 10),
                 Temp = c(1, NA, NA, NA, NA, 0, 2),
                 Sal = c(18, 17, 13, 16, 20, 36, 30))

df_list = list(df1, df2, df3, df4)
names(df_list) = c("Summer_temperature", "Winter_temperature",
                   "Summer_salinity", "Winter_salinity")

接下來，我用 lapply 總結了環境變量：

#select only those dataframes in the list that have either 'salinity' or 'temperature' in the dataframe names
df_sal = df_list[grep("salinity", names(df_list))]  
df_temp = df_list[grep("temperature", names(df_list))]  

#use apply to summarize salinity or temperature values in each dataframe
##salinity
df_sal2 = lapply(df_sal, function(x) {
      x %>%
        group_by(locality) %>% 
        summarise(Sal = mean(Sal, na.rm = TRUE)) 
    })
        
##temperature
df_temp2 = lapply(df_temp, function(x) {
      x %>%
        group_by(locality) %>% 
        summarise(Temp = mean(Temp, na.rm = TRUE)) 
    })

現在，這段代碼是重復的，所以我想通過將所有內容組合成一個函數來縮小它的大小。 這是我嘗試過的：

df_env = lapply(df_list, function(x) {
  if (grepl("salinity", names(x)) == TRUE) {x %>% group_by(locality) %>% summarise(Sal = mean(Sal, na.rm = TRUE))}
  if (grepl("temperature", names(x)) == TRUE) {x %>% group_by(locality) %>% summarise(Temp = mean(Temp, na.rm = TRUE))}
  })

但我得到以下輸出：

$Summer_temperature
NULL

$Winter_temperature
NULL

$Summer_salinity
NULL

$Winter_salinity
NULL

以及以下警告消息：

Warning messages:
1: In if (grepl("salinity", names(x)) == TRUE) { :
  the condition has length > 1 and only the first element will be used
2: In if (grepl("temperature", names(x)) == TRUE) { :
  the condition has length > 1 and only the first element will be used
3: In if (grepl("salinity", names(x)) == TRUE) { :
  the condition has length > 1 and only the first element will be used
4: In if (grepl("temperature", names(x)) == TRUE) { :
  the condition has length > 1 and only the first element will be used
5: In if (grepl("salinity", names(x)) == TRUE) { :
  the condition has length > 1 and only the first element will be used
6: In if (grepl("temperature", names(x)) == TRUE) { :
  the condition has length > 1 and only the first element will be used
7: In if (grepl("salinity", names(x)) == TRUE) { :
  the condition has length > 1 and only the first element will be used
8: In if (grepl("temperature", names(x)) == TRUE) { :
  the condition has length > 1 and only the first element will be used

現在，我在這里讀到這個警告消息可以通過使用ifelse來解決。 然而，在最終的數據集中，我將有兩個以上的環境變量，所以我將不得不添加更多的if語句——因此我相信ifelse不是這里的解決方案。 有人對我的問題有一個優雅的解決方案嗎？ 我是使用函數和 lapply 的新手，如果你能給我任何幫助，我將不勝感激。

編輯：

我嘗試使用答案之一中建議的 else if 選項，但這仍然返回 NULL 值。 我也嘗試了返回並將輸出分配給 x 但兩者都有與下面的代碼相同的問題 - 有什么想法嗎？

#else if
df_env = lapply(df_list, function(x) {
  if (grepl("salinity", names(x)) == TRUE) {
    x %>% group_by(locality) %>% 
      summarise(Sal = mean(Sal, na.rm = TRUE))}
  else if (grepl("temperature", names(x)) == TRUE) {
    x %>% group_by(locality) %>% 
      summarise(Temp = mean(Temp, na.rm = TRUE))}
})
df_env

我認為正在發生的事情是我的 if 參數沒有傳遞給 summarise 函數，因此沒有任何內容被匯總。

Answer 1

這里發生了幾件事，包括

正如akrun所說， if語句必須有一個長度為1的條件。你的不是。
```
 grepl("locality", names(df1)) # [1] TRUE FALSE FALSE
```
必須減少它，以便它始終是長度 1 。 坦率地說， grepl在這里是錯誤的工具，因為從技術上講，名為notlocality的列會匹配，然后會出錯。 我建議你改成
```
"locality" %in% names(df1) # [1] TRUE
```
你需要返回一些東西。 總是。 你從if ...; if ...; if ...; if ...; to if ... else if ... ，這是一個好的開始，但實際上如果您不滿足任何條件，則不會返回任何內容。 我建議以下之一：要么再添加一個} else x ，要么重新分配if (..) { x <- x %>% ...; } else if (..) { x <- x %>% ... ; } if (..) { x <- x %>% ...; } else if (..) { x <- x %>% ... ; } if (..) { x <- x %>% ...; } else if (..) { x <- x %>% ... ; }然后只用x結束 anon-func （返回它）。

但是，我認為最終的問題是您正在尋找list對象名稱中的"temperature"或"salinity" ，而不是框架本身。 例如，您對names(x)的引用返回c("locality", "Temp", "Sal") ，即框架x本身的名稱。

我想這就是你想要的？

Map(function(x, nm) {
  if (grepl("salinity", nm)) {
    x %>%
      group_by(locality) %>%
      summarize(Sal = mean(Sal, na.rm = TRUE))
  } else if (grepl("temperature", nm)) {
    x %>%
      group_by(locality) %>%
      summarize(Temp = mean(Temp, na.rm = TRUE))
  } else x
}, df_list, names(df_list))
# $Summer_temperature
# # A tibble: 5 x 2
#   locality  Temp
#      <dbl> <dbl>
# 1        1  14  
# 2        2  15.5
# 3        5  18  
# 4        7  19  
# 5        9  21  
# $Winter_temperature
# # A tibble: 5 x 2
#   locality  Temp
#      <dbl> <dbl>
# 1        1   1.5
# 2        3   4  
# 3        6   5  
# 4        8   0  
# 5        9   0.5
# $Summer_salinity
# # A tibble: 5 x 2
#   locality   Sal
#      <dbl> <dbl>
# 1        1  16  
# 2        3   8  
# 3        4  23.5
# 4        5  11.5
# 5        9   9  
# $Winter_salinity
# # A tibble: 5 x 2
#   locality   Sal
#      <dbl> <dbl>
# 1        1    16
# 2        4    16
# 3        7    20
# 4        8    36
# 5       10    30

R：函數內部的 if 語句（lapply）

問題描述

1 個解決方案

解決方案1
1 已采納 2022-05-24 20:06:12

R：函數內部的 if 語句（lapply）

問題描述

1 個解決方案

解決方案1 1 已采納 2022-05-24 20:06:12

解決方案1
1 已采納 2022-05-24 20:06:12