[英]How do I apply my function to a dataframe such that it creates a new column on the dataframe?
First, apologies for some awful, illogical, clunky code coming up. 首先,对一些糟糕的,不合逻辑的,笨拙的代码表示歉意。 I have MINIMAL experience with for loops and functions.
我对循环和函数有最少的经验。
In essence, I want to apply a function to a dataframe. 本质上,我想将函数应用于数据框。 This function provides a value [i] conditional on the values in two of the columns in the dataframe.
此函数提供一个值[i],该值[i]取决于数据帧中两列中的值。 I then want this value to be populated in a new column, and to align with the row containing the values that generated it.
然后,我希望将此值填充到新列中,并与包含生成它的值的行对齐。
This is using some already generated model values to create predicted abundance of an animal species. 这使用一些已经生成的模型值来创建动物物种的预测丰度。
I have created a fairly awful function, aligning with the known values of the generated model. 我创建了一个相当糟糕的函数,使其与生成的模型的已知值保持一致。
Here is an example of the data: 这是数据示例:
structure(list(X = 2:6, x = c(23.69772329, 23.33799932, 24.50995071,
22.37691419, 31.29742091), y = c(-18.75309389, -18.28537894,
-19.39926585, -19.23678464, -5.251863724), EVAP_Value = c(502L,
541L, 750L, 476L, 571L), HFI_Value = c(1, 1, 3.059409052, 2.250018061,
7), TERMAC_Value = c(605L, 605L, 118L, 605L, 236L), TERMAC_ShortName =
structure(c(4L,
4L, 1L, 4L, 2L), .Label = c("DAWS2", "EASM", "Marsh", "PV"), class =
"factor"),
GLOBCOV_Value = c(30L, 30L, 30L, 140L, 130L), Glob_ShortName =
structure(c(5L,
5L, 5L, 1L, 4L), .Label = c("Grass", "OpBdFrst", "OpNdFrst",
"Shrub", "VegCrop"), class = "factor"), Unknown_Value = c(527L,
546L, 488L, 430L, 1020L), Location = structure(c(1L, 1L,
1L, 1L, 2L), .Label = c("BWA", "TZA"), class = "factor"),
NDVI_mean = c(0.26736562, 0.28850313, 0.328852412, 0.271927773,
0.364711006), Random_Category = structure(c(2L, 2L, 2L, 2L,
1L), .Label = c("Random_Maasai", "Random_Southern"), class = "factor"),
num = c(1L, 1L, 1L, 1L, 1L), ID = structure(c(1L, 1L, 1L,
1L, 1L), .Label = "Random", class = "factor")), row.names = 2:6, class =
"data.frame")
For reference, it looks like this: 供参考,它看起来像这样:
X x y EVAP_Value HFI_Value TERMAC_Value
1 1 37.97434 -8.833364 1390 6.000000 601
2 2 23.69772 -18.753094 502 1.000000 605
3 3 23.33800 -18.285379 541 1.000000 605
4 4 24.50995 -19.399266 750 3.059409 118
5 5 22.37691 -19.236785 476 2.250018 605
6 6 31.29742 -5.251864 571 7.000000 236
TERMAC_ShortName GLOBCOV_Value Glob_ShortName Unknown_Value
1 <NA> 90 OpNdFrst 1038
2 PV 30 VegCrop 527
3 PV 30 VegCrop 546
4 DAWS2 30 VegCrop 488
5 PV 140 Grass 430
6 EASM 130 Shrub 1020
Location NDVI_mean Random_Category num ID
1 TZA 0.5356669 Random_Maasai 1 Random
2 BWA 0.2673656 Random_Southern 1 Random
3 BWA 0.2885031 Random_Southern 1 Random
4 BWA 0.3288524 Random_Southern 1 Random
5 BWA 0.2719278 Random_Southern 1 Random
6 TZA 0.3647110 Random_Maasai 1 Random
The two columns of interest are the TERMAC_ShortName
column and the Glob_ShortName
column. 感兴趣的两列是
TERMAC_ShortName
列和Glob_ShortName
列。 My efforts so far are: 到目前为止,我的努力是:
predict.bayes.animal <- function(data){
if (data$TERMAC_ShortName[i] == "PV") {
bayes_value[i] <- i - 0.772
}
if (data$TERMAC_ShortName[i] == "DAWS2") {
bayes_value[i] <- i - 1.24
}
if (data$TERMAC_ShortName[i] == "EASM") {
bayes_value[i] <- i - 0.362
}
if (data$Glob_ShortName[i] == "VegCrop") {
bayes_value[i] <- i - 0.3497
}
if (data$Glob_ShortName[i] == "Grass") {
bayes_value[i] <- i - 0.5978
}
if (data$Glob_ShortName[i] == "Shrub") {
bayes_value[i] <- i - 0.2285
}
if (data$TERMAC_ShortName[i] == "PV" | data$Glob_ShortName[i] ==
"VegCrop") {
bayes_value[i] <- i - 0.56
}
if (data$TERMAC_ShortName[i] == "DAWS2" | data$Glob_ShortName[i] ==
"VegCrop")
{
bayes_value[i] <- i + 0.43
}
if (data$TERMAC_ShortName[i] == "PV" | data$Glob_ShortName[i] ==
"Grass") {
bayes_value[i] <- i - 0.49
}
if (data$TERMAC_ShortName[i] == "EASM" | data$Glob_ShortName[i] ==
"Shrub") {
bayes_value[i] <- i - 0.045
}
bayes_value
}
data["bayes_value"] <- NA
for (i in 1:nrow(data)) {
n <- predict.bayes.animal(data)
data$bayes_value[i] <- n
}
Expected result is: 预期结果是:
X x y EVAP_Value HFI_Value TERMAC_Value
1 1 23.69772 -18.753094 502 1.000000 605
2 2 23.33800 -18.285379 541 1.000000 605
3 3 24.50995 -19.399266 750 3.059409 118
4 4 22.37691 -19.236785 476 2.250018 605
5 5 31.29742 -5.251864 571 7.000000 236
TERMAC_ShortName GLOBCOV_Value Glob_ShortName Unknown_Value
1 PV 30 VegCrop 527
2 PV 30 VegCrop 546
3 DAWS2 30 VegCrop 488
4 PV 140 Grass 430
5 EASM 130 Shrub 1020
Location NDVI_mean Random_Category num ID bayes_value
1 BWA 0.2673656 Random_Southern 1 Random -1.68
2 BWA 0.2885031 Random_Southern 1 Random -1.68
3 BWA 0.3288524 Random_Southern 1 Random -1.20
4 BWA 0.2719278 Random_Southern 1 Random -1.86
5 TZA 0.3647110 Random_Maasai 1 Random -0.64
The actual result so far is "Error in predict.bayes.animal(data) : object 'bayes_value' not found" 到目前为止,实际结果是“ predict.bayes.animal(data)中的错误:找不到对象'bayes_value'”
Thank you in advance for any assistance. 预先感谢您的协助。
As discussed in the comments, there is a bit of confusion about exactly what you are trying to do, but would using dplyr
's mutate
(to add new column) and case_when
(instead of multiple if statements) possibly simplify things? 正如评论中所讨论的,对于您要执行的操作确实有些困惑,但是会使用
dplyr
的mutate
(添加新列)和case_when
(而不是多个if语句)来简化事情吗? Eg: 例如:
library(dplyr)
data %>% mutate(bayes_value =
case_when(TERMAC_ShortName == "PV" ~ -0.772,
data$TERMAC_ShortName == "DAWS2"~-1.24,
<OTHER CASES HERE>))
REVISED: 修订:
data %>% mutate(bayes_value =
case_when(TERMAC_ShortName == "PV" ~ -0.772,
TERMAC_ShortName == "DAWS2"~-1.24,
<OTHER TERMAC_ShortName CASES HERE>
T~0)+
case_when(Glob_ShortName == "Grass"~-0.5978,
<OTHER Glob CASES HERE>
T~0)+
case_when(TERMAC_ShortName == "PV" | Glob_ShortName== "VegCrop"~-0.56,
<OTHER Combined CASES HERE>
T~0))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.