基于另一列在 r dataframe 中添加新的计算列

Question

I have a dataframe in R, called "data" like this我在R中有一个dataframe，像这样称为“数据”

c1      c2  c3
A1000   "x" 100 
A1200   "x" 200 
A3000   "y" 150 
A2000   "x" 250 
A3200   "t" 100 
A1000   "e" 250 
A1200   "w" 300

I need to create another column, lets say "c4", whith the category name based on following criteria:我需要创建另一个列，比如说“c4”，其类别名称基于以下条件：

Code        Name
-----------------------------
A10         "Activity 1"
A12         "Activity 2"
A20         "Activity 3"
other code  "Other activity"

Where "code" corresponds to the first 3 characters of column c1 in my data.其中“代码”对应于我的数据中 c1 列的前 3 个字符。 I have the following code in R:我在 R 中有以下代码：

cat_x <- function(data_x){
  if(substr(data_x, star=1, stop=3) == "A10"){
    return("Activity 1")
  } else if(substr(data_x, star=1, stop=3) == "A12") {
    return("Activity 2")
  } else if(substr(data_x, star=1, stop=3) == "A20") {
    return("Activity 3")
  } else {
    return("Other activity")
  }
  
}

data["c4"] <- cat_x(data$c1)

However I get the following error: "the condition has length > 1 and only the first element will be used"但是我收到以下错误：“条件的长度> 1，只会使用第一个元素”

Please help me to solve this, using my function "cat_x".请帮我解决这个问题，使用我的 function “cat_x”。

Thanks in advance提前致谢

Answer 1

Use sapply :使用sapply ：

df$c4 <- sapply(df$c1, cat_x)

Your code is not vectorized, so it's not coded to deal with an entire vector at once.您的代码未矢量化，因此未编码为一次处理整个矢量。 Instead it deals with one element at a time, which is how sapply will use it.相反，它一次处理一个元素，这就是sapply将如何使用它。

You could also use the library dplyr and case_when to code this like:您还可以使用库dplyr和case_when来编写如下代码：

library(dplyr)

df %>% 
  mutate(c4 = case_when( 
    startsWith(c1, "A10") ~ "Activity 1",
    startsWith(c1, "A12") ~ "Activity 2",
    startsWith(c1, "A20") ~ "Activity 3",
    T ~ "Other Activity"))

Output Output

     c1 c2  c3             c4
1 A1000  x 100     Activity 1
2 A1200  x 200     Activity 2
3 A3000  y 150 Other Activity
4 A2000  x 250     Activity 3
5 A3200  t 100 Other Activity
6 A1000  e 250     Activity 1
7 A1200  w 300     Activity 2

Answer 2

There are definitely better solutions out there but this one is the closest to your own.肯定有更好的解决方案，但这个最接近您自己的解决方案。 You first have to create an empty vector of type character with the same length as the number of rows in your data frame named c4 .您首先必须创建一个character类型的空向量，其长度与名为c4的数据框中的行数相同。 Then you iterate over the first column whose first three characters you would like to extract and then fill your c4 in every iteration with the right match.然后，您遍历要提取其前三个字符的第一列，然后在每次迭代中用正确的匹配填充您的c4 。

cat_x <- function(data_x){
  c4 <- vector("character", length = nrow(data_x))
  
  for(i in 1:nrow(data_x)) {
    if(substr(data_x[i, 1], star = 1, stop = 3) == "A10"){
      c4[[i]] <- "Activity 1"
    } else if(substr(data_x[i, 1], star = 1, stop = 3) == "A12") {
      c4[[i]] <- "Activity 2"
    } else if(substr(data_x[i, 1], star = 1, stop = 3) == "A20") {
      c4[[i]] <- "Activity 3"
    } else {
      c4[[i]] <- "Other activity"
    }
  }
  cbind(data_x, c4)
}

cat_x(df)

     c1 c2  c3             c4
1 A1000  x 100     Activity 1
2 A1200  x 200     Activity 2
3 A3000  y 150 Other activity
4 A2000  x 250     Activity 3
5 A3200  t 100 Other activity
6 A1000  e 250     Activity 1
7 A1200  w 300     Activity 2

Data数据

df <- read.table(header = TRUE, text = "
                 c1      c2  c3
A1000   x 100 
A1200   x 200 
A3000   y 150 
A2000   x 250 
A3200   t 100 
A1000   e 250 
A1200   w 300")

Answer 3

This is a standard merge operation.这是标准的合并操作。 First make your codes into a data frame and use dput to make them easily available:首先将您的代码放入数据框并使用dput使其易于使用：

data <- structure(list(c1 = c("A1000", "A1200", "A3000", "A2000", "A3200", 
    "A1000", "A1200"), c2 = c("x", "x", "y", "x", "t", "e", "w"), 
    c3 = c(100L, 200L, 150L, 250L, 100L, 250L, 300L)), class = "data.frame",
    row.names = c(NA, -7L))

codes <- structure(list(Code = c("A10", "A12", "A20"), Name = c("Activity 1", 
    "Activity 2", "Activity 3")), class = "data.frame", row.names = c(NA, -3L))

Now create a column in data that matches the code and merge:现在在data中创建一个与代码匹配的列并合并：

data$Code <- substr(data$c1, 1, 3)
data.mrg <- merge(data, codes, all=TRUE)
#   Code    c1 c2  c3       Name
# 1  A10 A1000  x 100 Activity 1
# 2  A10 A1000  e 250 Activity 1
# 3  A12 A1200  x 200 Activity 2
# 4  A12 A1200  w 300 Activity 2
# 5  A20 A2000  x 250 Activity 3
# 6  A30 A3000  y 150       <NA>
# 7  A32 A3200  t 100       <NA>

If you want to remove the Code column and rename Name to c4 :如果要删除Code列并将Name重命名为c4 ：

data.mrg <- data.mrg[, -1]      # Optional to get rid of first column
colnames(data.mrg)[4] <- "c4".  # Optional to change column name
data.mrg
#      c1 c2  c3         c4
# 1 A1000  x 100 Activity 1
# 2 A1000  e 250 Activity 1
# 3 A1200  x 200 Activity 2
# 4 A1200  w 300 Activity 2
# 5 A2000  x 250 Activity 3
# 6 A3000  y 150       <NA>
# 7 A3200  t 100       <NA>

基于另一列在 r dataframe 中添加新的计算列

问题描述

3 个解决方案

解决方案1
3 已采纳 2021-04-28 16:25:50

解决方案2
1 2021-04-28 16:30:09

解决方案3
1 2021-04-28 16:47:19

基于另一列在 r dataframe 中添加新的计算列

问题描述

3 个解决方案

解决方案1 3 已采纳 2021-04-28 16:25:50

解决方案2 1 2021-04-28 16:30:09

解决方案3 1 2021-04-28 16:47:19

解决方案1
3 已采纳 2021-04-28 16:25:50

解决方案2
1 2021-04-28 16:30:09

解决方案3
1 2021-04-28 16:47:19