[英]A new dataset column based on the specific characters from other columns (in R)
in my dataset I want to create a new column that will be conditional on characters from other two columns.在我的数据集中,我想创建一个新列,该列以来自其他两列的字符为条件。 If longDesciptions.desc.en_US has a word Plage in it AND at the same time externalCode starts with number 1 then add value A in a new column.
如果longDescriptions.desc.en_US 中有一个单词Plage并且同时externalCode以数字1开头,则在新列中添加值A。 If longDesciptions.desc.en_US does not have a word Plage in it AND at the same time externalCode starts with number 1 then add value B in a new column.
如果longDescriptions.desc.en_US中没有单词Plage并且同时externalCode以数字1开头,则在新列中添加值B。 Otherwise, leave it empty or NA.
否则,将其留空或不适用。
df <- structure(list(X.OPERATOR. = c(" Clear and Delete", NA, NA, NA,
NA, "<p>Je voornaamste taken:</p>"), externalCode = c("Job Profile.GUID",
"1008141", "1008168", "1008170", "1008170", NA), longDesciptions.sectionId = c("sectionId",
"199624017", "200226564", "200226592", "200226594", NA), longDesciptions.sectionType = c("sectionType",
"LONGDESCRIPTION", "LONGDESCRIPTION", "LONGDESCRIPTION", "LONGDESCRIPTION",
NA), longDesciptions.desc.en_US = c("US English", "Class: 06, Plage: C, Function code:",
"Class: 03", "Class: 03", "<p>Als Legal Counsel maak je deel uit van het departement Secretariaat-Generaal. Je ondersteunt zowel de secretaris-generaal en de directie alsook de verschillende entiteiten van Elia groep, zowel op nationaal als internationaal niveau.</p>",
NA), longDesciptions.desc.defaultValue = c("Default Value", "Class: 06, Plage: C, Function code:",
"Class: 03", "Class: 03", NA, NA), longDesciptions.desc.en_GB = c("English (United Kingdom)",
"Class: 06, Plage: C, Function code:", "Class: 03", "Class: 03",
NA, NA), longDesciptions.desc.de_DE = c("German (Germany)", NA,
NA, NA, NA, NA), longDesciptions.desc.fr_FR = c("French (France)",
"Classe: 06, Plage: C, Code de la fonction:", "Classe: 03", "Classe: 03",
NA, NA), longDesciptions.desc.nl_NL = c("Dutch (Netherlands)",
"Klasse: 06, Plage: C, Functiecode:", "Klasse: 03", "Klasse: 03",
NA, NA), longDesciptions.status = c("status(Valid Values : A/I A for Active I for Inactive )",
"A", "A", "A", NA, NA), longDesciptions.externalCode = c("externalCode",
"1035137", "1035330", "1035330", NA, NA), longDesciptions.subModule = c("subModule",
NA, NA, NA, NA, NA), NA. = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..1 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), NA..2 = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), NA..3 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
NA..4 = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), NA..5 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), NA..6 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..7 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), NA..8 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..9 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), NA..10 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), NA..11 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..12 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), NA..13 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), NA..14 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..15 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), NA..16 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), NA..17 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..18 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), NA..19 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), NA..20 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..21 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), NA..22 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), NA..23 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..24 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), NA..25 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), NA..26 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..27 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), NA..28 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), NA..29 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..30 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), NA..31 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), NA..32 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NA..33 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), NA..34 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
)), class = "data.frame", row.names = c(NA, -6L))
I have tried this code but it does not work:我试过这段代码,但它不起作用:
df2[,49] <- NA #
names(df2)[49] <- "JobDescrip"
for (i in 1 : nrow(df2)) {
if (df2$externalCode[i] == '^1' && df2$longDesciptions.sectionId[i]==
'^P') {
df2[i,49] <- "A"
}
if (df2$externalCode[i] == '^1') {
df2[i,49] <- "B"
}
else {
df2[i,49] <- ""
}
}
Error in if (df2$externalCode[i] == "^1" && df2$longDesciptions.sectionId[i] == :
missing value where TRUE/FALSE needed
I know that this type of a question has been asked many times but I could not find a solution feasible for my data.我知道此类问题已被问过很多次,但我找不到适合我的数据的解决方案。 Any help would be appreciated!
任何帮助,将不胜感激!
Here is a tidyverse
approach you could consider.这是您可以考虑的
tidyverse
方法。 I would think about other vectorized approaches instead of a loop.我会考虑其他矢量化方法而不是循环。
In this case, you can use mutate
from dplyr
to add your new column, and case_when
instead of multiple if
statements to add logic.在这种情况下,您可以使用
dplyr
mutate
添加新列,并使用case_when
代替多个if
语句来添加逻辑。 If the first evaluation is false, then the second evaluation is tested, and so on.如果第一个评估为假,则测试第二个评估,依此类推。
If you use grepl
you can check if the string contains "Plage" (you can consider alternatives for other regex patterns).如果您使用
grepl
您可以检查字符串是否包含“Plage”(您可以考虑其他正则表达式模式的替代方案)。 Use of substr
can look at specific characters in a string.使用
substr
可以查看字符串中的特定字符。
library(dplyr)
df %>%
mutate(job_descrip = case_when(
grepl("Plage", longDesciptions.desc.en_US) & substr(externalCode, 1, 1) == "1" ~ "A",
substr(externalCode, 1, 1) == "1" ~ "B",
TRUE ~ NA_character_
))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.