I have a dataframe in R, called "data" like this
c1 c2 c3
A1000 "x" 100
A1200 "x" 200
A3000 "y" 150
A2000 "x" 250
A3200 "t" 100
A1000 "e" 250
A1200 "w" 300
I need to create another column, lets say "c4", whith the category name based on following criteria:
Code Name
-----------------------------
A10 "Activity 1"
A12 "Activity 2"
A20 "Activity 3"
other code "Other activity"
Where "code" corresponds to the first 3 characters of column c1 in my data. I have the following code in R:
cat_x <- function(data_x){
if(substr(data_x, star=1, stop=3) == "A10"){
return("Activity 1")
} else if(substr(data_x, star=1, stop=3) == "A12") {
return("Activity 2")
} else if(substr(data_x, star=1, stop=3) == "A20") {
return("Activity 3")
} else {
return("Other activity")
}
}
data["c4"] <- cat_x(data$c1)
However I get the following error: "the condition has length > 1 and only the first element will be used"
Please help me to solve this, using my function "cat_x".
Thanks in advance
Use sapply
:
df$c4 <- sapply(df$c1, cat_x)
Your code is not vectorized, so it's not coded to deal with an entire vector at once. Instead it deals with one element at a time, which is how sapply
will use it.
You could also use the library dplyr
and case_when
to code this like:
library(dplyr)
df %>%
mutate(c4 = case_when(
startsWith(c1, "A10") ~ "Activity 1",
startsWith(c1, "A12") ~ "Activity 2",
startsWith(c1, "A20") ~ "Activity 3",
T ~ "Other Activity"))
Output
c1 c2 c3 c4
1 A1000 x 100 Activity 1
2 A1200 x 200 Activity 2
3 A3000 y 150 Other Activity
4 A2000 x 250 Activity 3
5 A3200 t 100 Other Activity
6 A1000 e 250 Activity 1
7 A1200 w 300 Activity 2
There are definitely better solutions out there but this one is the closest to your own. You first have to create an empty vector of type character
with the same length as the number of rows in your data frame named c4
. Then you iterate over the first column whose first three characters you would like to extract and then fill your c4
in every iteration with the right match.
cat_x <- function(data_x){
c4 <- vector("character", length = nrow(data_x))
for(i in 1:nrow(data_x)) {
if(substr(data_x[i, 1], star = 1, stop = 3) == "A10"){
c4[[i]] <- "Activity 1"
} else if(substr(data_x[i, 1], star = 1, stop = 3) == "A12") {
c4[[i]] <- "Activity 2"
} else if(substr(data_x[i, 1], star = 1, stop = 3) == "A20") {
c4[[i]] <- "Activity 3"
} else {
c4[[i]] <- "Other activity"
}
}
cbind(data_x, c4)
}
cat_x(df)
c1 c2 c3 c4
1 A1000 x 100 Activity 1
2 A1200 x 200 Activity 2
3 A3000 y 150 Other activity
4 A2000 x 250 Activity 3
5 A3200 t 100 Other activity
6 A1000 e 250 Activity 1
7 A1200 w 300 Activity 2
Data
df <- read.table(header = TRUE, text = "
c1 c2 c3
A1000 x 100
A1200 x 200
A3000 y 150
A2000 x 250
A3200 t 100
A1000 e 250
A1200 w 300")
This is a standard merge operation. First make your codes into a data frame and use dput
to make them easily available:
data <- structure(list(c1 = c("A1000", "A1200", "A3000", "A2000", "A3200",
"A1000", "A1200"), c2 = c("x", "x", "y", "x", "t", "e", "w"),
c3 = c(100L, 200L, 150L, 250L, 100L, 250L, 300L)), class = "data.frame",
row.names = c(NA, -7L))
codes <- structure(list(Code = c("A10", "A12", "A20"), Name = c("Activity 1",
"Activity 2", "Activity 3")), class = "data.frame", row.names = c(NA, -3L))
Now create a column in data
that matches the code and merge:
data$Code <- substr(data$c1, 1, 3)
data.mrg <- merge(data, codes, all=TRUE)
# Code c1 c2 c3 Name
# 1 A10 A1000 x 100 Activity 1
# 2 A10 A1000 e 250 Activity 1
# 3 A12 A1200 x 200 Activity 2
# 4 A12 A1200 w 300 Activity 2
# 5 A20 A2000 x 250 Activity 3
# 6 A30 A3000 y 150 <NA>
# 7 A32 A3200 t 100 <NA>
If you want to remove the Code
column and rename Name
to c4
:
data.mrg <- data.mrg[, -1] # Optional to get rid of first column
colnames(data.mrg)[4] <- "c4". # Optional to change column name
data.mrg
# c1 c2 c3 c4
# 1 A1000 x 100 Activity 1
# 2 A1000 e 250 Activity 1
# 3 A1200 x 200 Activity 2
# 4 A1200 w 300 Activity 2
# 5 A2000 x 250 Activity 3
# 6 A3000 y 150 <NA>
# 7 A3200 t 100 <NA>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.