简体   繁体   English

如何将函数应用于数据框,以便在数据框上创建新列?

[英]How do I apply my function to a dataframe such that it creates a new column on the dataframe?

First, apologies for some awful, illogical, clunky code coming up. 首先,对一些糟糕的,不合逻辑的,笨拙的代码表示歉意。 I have MINIMAL experience with for loops and functions. 我对循环和函数有最少的经验。

In essence, I want to apply a function to a dataframe. 本质上,我想将函数应用于数据框。 This function provides a value [i] conditional on the values in two of the columns in the dataframe. 此函数提供一个值[i],该值[i]取决于数据帧中两列中的值。 I then want this value to be populated in a new column, and to align with the row containing the values that generated it. 然后,我希望将此值填充到新列中,并与包含生成它的值的行对齐。

This is using some already generated model values to create predicted abundance of an animal species. 这使用一些已经生成的模型值来创建动物物种的预测丰度。

I have created a fairly awful function, aligning with the known values of the generated model. 我创建了一个相当糟糕的函数,使其与生成的模型的已知值保持一致。

Here is an example of the data: 这是数据示例:

structure(list(X = 2:6, x = c(23.69772329, 23.33799932, 24.50995071, 
22.37691419, 31.29742091), y = c(-18.75309389, -18.28537894, 
-19.39926585, -19.23678464, -5.251863724), EVAP_Value = c(502L, 
541L, 750L, 476L, 571L), HFI_Value = c(1, 1, 3.059409052, 2.250018061, 
7), TERMAC_Value = c(605L, 605L, 118L, 605L, 236L), TERMAC_ShortName = 
structure(c(4L, 
4L, 1L, 4L, 2L), .Label = c("DAWS2", "EASM", "Marsh", "PV"), class = 
"factor"), 
GLOBCOV_Value = c(30L, 30L, 30L, 140L, 130L), Glob_ShortName = 
structure(c(5L, 
5L, 5L, 1L, 4L), .Label = c("Grass", "OpBdFrst", "OpNdFrst", 
"Shrub", "VegCrop"), class = "factor"), Unknown_Value = c(527L, 
546L, 488L, 430L, 1020L), Location = structure(c(1L, 1L, 
1L, 1L, 2L), .Label = c("BWA", "TZA"), class = "factor"), 
NDVI_mean = c(0.26736562, 0.28850313, 0.328852412, 0.271927773, 
0.364711006), Random_Category = structure(c(2L, 2L, 2L, 2L, 
1L), .Label = c("Random_Maasai", "Random_Southern"), class = "factor"), 
num = c(1L, 1L, 1L, 1L, 1L), ID = structure(c(1L, 1L, 1L, 
1L, 1L), .Label = "Random", class = "factor")), row.names = 2:6, class = 
"data.frame")

For reference, it looks like this: 供参考,它看起来像这样:

X        x          y EVAP_Value HFI_Value TERMAC_Value
1 1 37.97434  -8.833364       1390  6.000000          601
2 2 23.69772 -18.753094        502  1.000000          605
3 3 23.33800 -18.285379        541  1.000000          605
4 4 24.50995 -19.399266        750  3.059409          118
5 5 22.37691 -19.236785        476  2.250018          605
6 6 31.29742  -5.251864        571  7.000000          236
        TERMAC_ShortName GLOBCOV_Value Glob_ShortName Unknown_Value
1             <NA>            90       OpNdFrst          1038
2               PV            30        VegCrop           527
3               PV            30        VegCrop           546
4            DAWS2            30        VegCrop           488
5               PV           140          Grass           430
6             EASM           130          Shrub          1020
  Location NDVI_mean Random_Category num     ID
1      TZA 0.5356669   Random_Maasai   1 Random
2      BWA 0.2673656 Random_Southern   1 Random
3      BWA 0.2885031 Random_Southern   1 Random
4      BWA 0.3288524 Random_Southern   1 Random
5      BWA 0.2719278 Random_Southern   1 Random
6      TZA 0.3647110   Random_Maasai   1 Random

The two columns of interest are the TERMAC_ShortName column and the Glob_ShortName column. 感兴趣的两列是TERMAC_ShortName列和Glob_ShortName列。 My efforts so far are: 到目前为止,我的努力是:

 predict.bayes.animal <- function(data){
         if (data$TERMAC_ShortName[i] == "PV") {
           bayes_value[i] <- i - 0.772
  }
         if (data$TERMAC_ShortName[i] == "DAWS2") {
            bayes_value[i] <- i - 1.24
  }
         if (data$TERMAC_ShortName[i] == "EASM") {
            bayes_value[i] <- i - 0.362
  }
         if (data$Glob_ShortName[i] == "VegCrop") {
            bayes_value[i] <- i - 0.3497
 }
         if (data$Glob_ShortName[i] == "Grass") {
            bayes_value[i] <- i - 0.5978
  }
         if (data$Glob_ShortName[i] == "Shrub") {
            bayes_value[i] <- i - 0.2285
  }
         if (data$TERMAC_ShortName[i] == "PV" | data$Glob_ShortName[i] == 
         "VegCrop") {
            bayes_value[i] <- i - 0.56
  }
         if (data$TERMAC_ShortName[i] == "DAWS2" | data$Glob_ShortName[i] == 
         "VegCrop") 
 {
            bayes_value[i] <- i + 0.43
  }
         if (data$TERMAC_ShortName[i] == "PV" | data$Glob_ShortName[i] == 
         "Grass") {
            bayes_value[i] <- i - 0.49
  }
         if (data$TERMAC_ShortName[i] == "EASM" | data$Glob_ShortName[i] == 
         "Shrub") {
            bayes_value[i] <- i - 0.045
  }
   bayes_value
  }

   data["bayes_value"] <- NA
   for (i in 1:nrow(data)) { 
      n <- predict.bayes.animal(data)
      data$bayes_value[i] <- n
  }

Expected result is: 预期结果是:

X        x          y EVAP_Value HFI_Value TERMAC_Value
1 1 23.69772 -18.753094        502  1.000000          605
2 2 23.33800 -18.285379        541  1.000000          605
3 3 24.50995 -19.399266        750  3.059409          118
4 4 22.37691 -19.236785        476  2.250018          605
5 5 31.29742  -5.251864        571  7.000000          236
        TERMAC_ShortName GLOBCOV_Value Glob_ShortName Unknown_Value
1               PV            30        VegCrop           527
2               PV            30        VegCrop           546
3            DAWS2            30        VegCrop           488
4               PV           140          Grass           430
5             EASM           130          Shrub          1020
  Location NDVI_mean Random_Category num     ID   bayes_value
1      BWA 0.2673656 Random_Southern   1 Random       -1.68
2      BWA 0.2885031 Random_Southern   1 Random       -1.68
3      BWA 0.3288524 Random_Southern   1 Random       -1.20
4      BWA 0.2719278 Random_Southern   1 Random       -1.86
5      TZA 0.3647110   Random_Maasai   1 Random       -0.64

The actual result so far is "Error in predict.bayes.animal(data) : object 'bayes_value' not found" 到目前为止,实际结果是“ predict.bayes.animal(data)中的错误:找不到对象'bayes_value'”

Thank you in advance for any assistance. 预先感谢您的协助。

As discussed in the comments, there is a bit of confusion about exactly what you are trying to do, but would using dplyr 's mutate (to add new column) and case_when (instead of multiple if statements) possibly simplify things? 正如评论中所讨论的,对于您要执行的操作确实有些困惑,但是会使用dplyrmutate (添加新列)和case_when (而不是多个if语句)来简化事情吗? Eg: 例如:

library(dplyr)
data %>% mutate(bayes_value = 
                  case_when(TERMAC_ShortName == "PV" ~ -0.772,
                            data$TERMAC_ShortName == "DAWS2"~-1.24,
                            <OTHER CASES HERE>))

REVISED: 修订:

  data %>% mutate(bayes_value = 
                      case_when(TERMAC_ShortName == "PV" ~ -0.772,
                                TERMAC_ShortName == "DAWS2"~-1.24,
                                <OTHER TERMAC_ShortName CASES HERE>
                                T~0)+
                      case_when(Glob_ShortName == "Grass"~-0.5978,
                                <OTHER Glob CASES HERE>
                                T~0)+
                      case_when(TERMAC_ShortName == "PV" | Glob_ShortName== "VegCrop"~-0.56,
                                <OTHER Combined CASES HERE>
                                T~0))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在SPARKR DataFrame中的列的每个值上应用函数? - How do I apply a function on each value of a column in a SPARKR DataFrame? 如何将权重应用于数据框中的特定列以聚合新的“得分”列? - How do I apply weights to particular columns in a dataframe to aggregate a new 'score' column? 如何将自定义函数应用于我的数据框的每一列 - How to apply a custom function to each column of my dataframe 如何申请 3 function 新建 dataframe - How to apply 3 function to create a new dataframe 如何将函数应用于数据帧,然后应用于数据帧列表? - How do I apply a function to a dataframe and then to a list of dataframes? 如何将 function 应用于数据框(R)中的列(文本)? - How to apply a function to a column(text) in a dataframe(R)? 如何将 function 应用于 R 的 dataframe 中的每一列 - how to apply function to each column in dataframe of R 将 function 应用于 dataframe 中的列的每一行以创建新列 - Apply a function to each row of a column in a dataframe to create a new column 如何在 dataframe 中生成 ngram,以便每个 ngram 创建一个新行? - How do I generate ngrams in a dataframe so that each ngram creates a new row? 尝试将函数rowwise应用于数据框以创建新列 - Trying to apply a function rowwise to a dataframe to create a new column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM