[英]Add new calculated column in a r dataframe based on another column
I have a dataframe in R, called "data" like this我在R中有一个dataframe,像这样称为“数据”
c1 c2 c3
A1000 "x" 100
A1200 "x" 200
A3000 "y" 150
A2000 "x" 250
A3200 "t" 100
A1000 "e" 250
A1200 "w" 300
I need to create another column, lets say "c4", whith the category name based on following criteria:我需要创建另一个列,比如说“c4”,其类别名称基于以下条件:
Code Name
-----------------------------
A10 "Activity 1"
A12 "Activity 2"
A20 "Activity 3"
other code "Other activity"
Where "code" corresponds to the first 3 characters of column c1 in my data.其中“代码”对应于我的数据中 c1 列的前 3 个字符。 I have the following code in R:
我在 R 中有以下代码:
cat_x <- function(data_x){
if(substr(data_x, star=1, stop=3) == "A10"){
return("Activity 1")
} else if(substr(data_x, star=1, stop=3) == "A12") {
return("Activity 2")
} else if(substr(data_x, star=1, stop=3) == "A20") {
return("Activity 3")
} else {
return("Other activity")
}
}
data["c4"] <- cat_x(data$c1)
However I get the following error: "the condition has length > 1 and only the first element will be used"但是我收到以下错误:“条件的长度> 1,只会使用第一个元素”
Please help me to solve this, using my function "cat_x".请帮我解决这个问题,使用我的 function “cat_x”。
Thanks in advance提前致谢
Use sapply
:使用
sapply
:
df$c4 <- sapply(df$c1, cat_x)
Your code is not vectorized, so it's not coded to deal with an entire vector at once.您的代码未矢量化,因此未编码为一次处理整个矢量。 Instead it deals with one element at a time, which is how
sapply
will use it.相反,它一次处理一个元素,这就是
sapply
将如何使用它。
You could also use the library dplyr
and case_when
to code this like:您还可以使用库
dplyr
和case_when
来编写如下代码:
library(dplyr)
df %>%
mutate(c4 = case_when(
startsWith(c1, "A10") ~ "Activity 1",
startsWith(c1, "A12") ~ "Activity 2",
startsWith(c1, "A20") ~ "Activity 3",
T ~ "Other Activity"))
Output Output
c1 c2 c3 c4
1 A1000 x 100 Activity 1
2 A1200 x 200 Activity 2
3 A3000 y 150 Other Activity
4 A2000 x 250 Activity 3
5 A3200 t 100 Other Activity
6 A1000 e 250 Activity 1
7 A1200 w 300 Activity 2
There are definitely better solutions out there but this one is the closest to your own.肯定有更好的解决方案,但这个最接近您自己的解决方案。 You first have to create an empty vector of type
character
with the same length as the number of rows in your data frame named c4
.您首先必须创建一个
character
类型的空向量,其长度与名为c4
的数据框中的行数相同。 Then you iterate over the first column whose first three characters you would like to extract and then fill your c4
in every iteration with the right match.然后,您遍历要提取其前三个字符的第一列,然后在每次迭代中用正确的匹配填充您的
c4
。
cat_x <- function(data_x){
c4 <- vector("character", length = nrow(data_x))
for(i in 1:nrow(data_x)) {
if(substr(data_x[i, 1], star = 1, stop = 3) == "A10"){
c4[[i]] <- "Activity 1"
} else if(substr(data_x[i, 1], star = 1, stop = 3) == "A12") {
c4[[i]] <- "Activity 2"
} else if(substr(data_x[i, 1], star = 1, stop = 3) == "A20") {
c4[[i]] <- "Activity 3"
} else {
c4[[i]] <- "Other activity"
}
}
cbind(data_x, c4)
}
cat_x(df)
c1 c2 c3 c4
1 A1000 x 100 Activity 1
2 A1200 x 200 Activity 2
3 A3000 y 150 Other activity
4 A2000 x 250 Activity 3
5 A3200 t 100 Other activity
6 A1000 e 250 Activity 1
7 A1200 w 300 Activity 2
Data数据
df <- read.table(header = TRUE, text = "
c1 c2 c3
A1000 x 100
A1200 x 200
A3000 y 150
A2000 x 250
A3200 t 100
A1000 e 250
A1200 w 300")
This is a standard merge operation.这是标准的合并操作。 First make your codes into a data frame and use
dput
to make them easily available:首先将您的代码放入数据框并使用
dput
使其易于使用:
data <- structure(list(c1 = c("A1000", "A1200", "A3000", "A2000", "A3200",
"A1000", "A1200"), c2 = c("x", "x", "y", "x", "t", "e", "w"),
c3 = c(100L, 200L, 150L, 250L, 100L, 250L, 300L)), class = "data.frame",
row.names = c(NA, -7L))
codes <- structure(list(Code = c("A10", "A12", "A20"), Name = c("Activity 1",
"Activity 2", "Activity 3")), class = "data.frame", row.names = c(NA, -3L))
Now create a column in data
that matches the code and merge:现在在
data
中创建一个与代码匹配的列并合并:
data$Code <- substr(data$c1, 1, 3)
data.mrg <- merge(data, codes, all=TRUE)
# Code c1 c2 c3 Name
# 1 A10 A1000 x 100 Activity 1
# 2 A10 A1000 e 250 Activity 1
# 3 A12 A1200 x 200 Activity 2
# 4 A12 A1200 w 300 Activity 2
# 5 A20 A2000 x 250 Activity 3
# 6 A30 A3000 y 150 <NA>
# 7 A32 A3200 t 100 <NA>
If you want to remove the Code
column and rename Name
to c4
:如果要删除
Code
列并将Name
重命名为c4
:
data.mrg <- data.mrg[, -1] # Optional to get rid of first column
colnames(data.mrg)[4] <- "c4". # Optional to change column name
data.mrg
# c1 c2 c3 c4
# 1 A1000 x 100 Activity 1
# 2 A1000 e 250 Activity 1
# 3 A1200 x 200 Activity 2
# 4 A1200 w 300 Activity 2
# 5 A2000 x 250 Activity 3
# 6 A3000 y 150 <NA>
# 7 A3200 t 100 <NA>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.