如何将函数应用于数据框，以便在数据框上创建新列？

Question

First, apologies for some awful, illogical, clunky code coming up. 首先，对一些糟糕的，不合逻辑的，笨拙的代码表示歉意。 I have MINIMAL experience with for loops and functions. 我对循环和函数有最少的经验。

In essence, I want to apply a function to a dataframe. 本质上，我想将函数应用于数据框。 This function provides a value [i] conditional on the values in two of the columns in the dataframe. 此函数提供一个值[i]，该值[i]取决于数据帧中两列中的值。 I then want this value to be populated in a new column, and to align with the row containing the values that generated it. 然后，我希望将此值填充到新列中，并与包含生成它的值的行对齐。

This is using some already generated model values to create predicted abundance of an animal species. 这使用一些已经生成的模型值来创建动物物种的预测丰度。

I have created a fairly awful function, aligning with the known values of the generated model. 我创建了一个相当糟糕的函数，使其与生成的模型的已知值保持一致。

Here is an example of the data: 这是数据示例：

structure(list(X = 2:6, x = c(23.69772329, 23.33799932, 24.50995071, 
22.37691419, 31.29742091), y = c(-18.75309389, -18.28537894, 
-19.39926585, -19.23678464, -5.251863724), EVAP_Value = c(502L, 
541L, 750L, 476L, 571L), HFI_Value = c(1, 1, 3.059409052, 2.250018061, 
7), TERMAC_Value = c(605L, 605L, 118L, 605L, 236L), TERMAC_ShortName = 
structure(c(4L, 
4L, 1L, 4L, 2L), .Label = c("DAWS2", "EASM", "Marsh", "PV"), class = 
"factor"), 
GLOBCOV_Value = c(30L, 30L, 30L, 140L, 130L), Glob_ShortName = 
structure(c(5L, 
5L, 5L, 1L, 4L), .Label = c("Grass", "OpBdFrst", "OpNdFrst", 
"Shrub", "VegCrop"), class = "factor"), Unknown_Value = c(527L, 
546L, 488L, 430L, 1020L), Location = structure(c(1L, 1L, 
1L, 1L, 2L), .Label = c("BWA", "TZA"), class = "factor"), 
NDVI_mean = c(0.26736562, 0.28850313, 0.328852412, 0.271927773, 
0.364711006), Random_Category = structure(c(2L, 2L, 2L, 2L, 
1L), .Label = c("Random_Maasai", "Random_Southern"), class = "factor"), 
num = c(1L, 1L, 1L, 1L, 1L), ID = structure(c(1L, 1L, 1L, 
1L, 1L), .Label = "Random", class = "factor")), row.names = 2:6, class = 
"data.frame")

For reference, it looks like this: 供参考，它看起来像这样：

X        x          y EVAP_Value HFI_Value TERMAC_Value
1 1 37.97434  -8.833364       1390  6.000000          601
2 2 23.69772 -18.753094        502  1.000000          605
3 3 23.33800 -18.285379        541  1.000000          605
4 4 24.50995 -19.399266        750  3.059409          118
5 5 22.37691 -19.236785        476  2.250018          605
6 6 31.29742  -5.251864        571  7.000000          236
        TERMAC_ShortName GLOBCOV_Value Glob_ShortName Unknown_Value
1             <NA>            90       OpNdFrst          1038
2               PV            30        VegCrop           527
3               PV            30        VegCrop           546
4            DAWS2            30        VegCrop           488
5               PV           140          Grass           430
6             EASM           130          Shrub          1020
  Location NDVI_mean Random_Category num     ID
1      TZA 0.5356669   Random_Maasai   1 Random
2      BWA 0.2673656 Random_Southern   1 Random
3      BWA 0.2885031 Random_Southern   1 Random
4      BWA 0.3288524 Random_Southern   1 Random
5      BWA 0.2719278 Random_Southern   1 Random
6      TZA 0.3647110   Random_Maasai   1 Random

The two columns of interest are the TERMAC_ShortName column and the Glob_ShortName column. 感兴趣的两列是TERMAC_ShortName列和Glob_ShortName列。 My efforts so far are: 到目前为止，我的努力是：

 predict.bayes.animal <- function(data){
         if (data$TERMAC_ShortName[i] == "PV") {
           bayes_value[i] <- i - 0.772
  }
         if (data$TERMAC_ShortName[i] == "DAWS2") {
            bayes_value[i] <- i - 1.24
  }
         if (data$TERMAC_ShortName[i] == "EASM") {
            bayes_value[i] <- i - 0.362
  }
         if (data$Glob_ShortName[i] == "VegCrop") {
            bayes_value[i] <- i - 0.3497
 }
         if (data$Glob_ShortName[i] == "Grass") {
            bayes_value[i] <- i - 0.5978
  }
         if (data$Glob_ShortName[i] == "Shrub") {
            bayes_value[i] <- i - 0.2285
  }
         if (data$TERMAC_ShortName[i] == "PV" | data$Glob_ShortName[i] == 
         "VegCrop") {
            bayes_value[i] <- i - 0.56
  }
         if (data$TERMAC_ShortName[i] == "DAWS2" | data$Glob_ShortName[i] == 
         "VegCrop") 
 {
            bayes_value[i] <- i + 0.43
  }
         if (data$TERMAC_ShortName[i] == "PV" | data$Glob_ShortName[i] == 
         "Grass") {
            bayes_value[i] <- i - 0.49
  }
         if (data$TERMAC_ShortName[i] == "EASM" | data$Glob_ShortName[i] == 
         "Shrub") {
            bayes_value[i] <- i - 0.045
  }
   bayes_value
  }

   data["bayes_value"] <- NA
   for (i in 1:nrow(data)) { 
      n <- predict.bayes.animal(data)
      data$bayes_value[i] <- n
  }

Expected result is: 预期结果是：

X        x          y EVAP_Value HFI_Value TERMAC_Value
1 1 23.69772 -18.753094        502  1.000000          605
2 2 23.33800 -18.285379        541  1.000000          605
3 3 24.50995 -19.399266        750  3.059409          118
4 4 22.37691 -19.236785        476  2.250018          605
5 5 31.29742  -5.251864        571  7.000000          236
        TERMAC_ShortName GLOBCOV_Value Glob_ShortName Unknown_Value
1               PV            30        VegCrop           527
2               PV            30        VegCrop           546
3            DAWS2            30        VegCrop           488
4               PV           140          Grass           430
5             EASM           130          Shrub          1020
  Location NDVI_mean Random_Category num     ID   bayes_value
1      BWA 0.2673656 Random_Southern   1 Random       -1.68
2      BWA 0.2885031 Random_Southern   1 Random       -1.68
3      BWA 0.3288524 Random_Southern   1 Random       -1.20
4      BWA 0.2719278 Random_Southern   1 Random       -1.86
5      TZA 0.3647110   Random_Maasai   1 Random       -0.64

The actual result so far is "Error in predict.bayes.animal(data) : object 'bayes_value' not found" 到目前为止，实际结果是“ predict.bayes.animal（data）中的错误：找不到对象'bayes_value'”

Thank you in advance for any assistance. 预先感谢您的协助。

Answer 1

As discussed in the comments, there is a bit of confusion about exactly what you are trying to do, but would using dplyr 's mutate (to add new column) and case_when (instead of multiple if statements) possibly simplify things? 正如评论中所讨论的，对于您要执行的操作确实有些困惑，但是会使用dplyr的mutate （添加新列）和case_when （而不是多个if语句）来简化事情吗？ Eg: 例如：

library(dplyr)
data %>% mutate(bayes_value = 
                  case_when(TERMAC_ShortName == "PV" ~ -0.772,
                            data$TERMAC_ShortName == "DAWS2"~-1.24,
                            <OTHER CASES HERE>))

REVISED: 修订：

  data %>% mutate(bayes_value = 
                      case_when(TERMAC_ShortName == "PV" ~ -0.772,
                                TERMAC_ShortName == "DAWS2"~-1.24,
                                <OTHER TERMAC_ShortName CASES HERE>
                                T~0)+
                      case_when(Glob_ShortName == "Grass"~-0.5978,
                                <OTHER Glob CASES HERE>
                                T~0)+
                      case_when(TERMAC_ShortName == "PV" | Glob_ShortName== "VegCrop"~-0.56,
                                <OTHER Combined CASES HERE>
                                T~0))

如何将函数应用于数据框，以便在数据框上创建新列？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-01-25 00:50:21

如何将函数应用于数据框，以便在数据框上创建新列？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-01-25 00:50:21

解决方案1
1 已采纳 2019-01-25 00:50:21