简体   繁体   English

基于另一列在 r dataframe 中添加新的计算列

[英]Add new calculated column in a r dataframe based on another column

I have a dataframe in R, called "data" like this我在R中有一个dataframe,像这样称为“数据”

c1      c2  c3
A1000   "x" 100 
A1200   "x" 200 
A3000   "y" 150 
A2000   "x" 250 
A3200   "t" 100 
A1000   "e" 250 
A1200   "w" 300

I need to create another column, lets say "c4", whith the category name based on following criteria:我需要创建另一个列,比如说“c4”,其类别名称基于以下条件:

Code        Name
-----------------------------
A10         "Activity 1"
A12         "Activity 2"
A20         "Activity 3"
other code  "Other activity"

Where "code" corresponds to the first 3 characters of column c1 in my data.其中“代码”对应于我的数据中 c1 列的前 3 个字符。 I have the following code in R:我在 R 中有以下代码:

cat_x <- function(data_x){
  if(substr(data_x, star=1, stop=3) == "A10"){
    return("Activity 1")
  } else if(substr(data_x, star=1, stop=3) == "A12") {
    return("Activity 2")
  } else if(substr(data_x, star=1, stop=3) == "A20") {
    return("Activity 3")
  } else {
    return("Other activity")
  }
  
}

data["c4"] <- cat_x(data$c1)

However I get the following error: "the condition has length > 1 and only the first element will be used"但是我收到以下错误:“条件的长度> 1,只会使用第一个元素”

Please help me to solve this, using my function "cat_x".请帮我解决这个问题,使用我的 function “cat_x”。

Thanks in advance提前致谢

Use sapply :使用sapply

df$c4 <- sapply(df$c1, cat_x)

Your code is not vectorized, so it's not coded to deal with an entire vector at once.您的代码未矢量化,因此未编码为一次处理整个矢量。 Instead it deals with one element at a time, which is how sapply will use it.相反,它一次处理一个元素,这就是sapply将如何使用它。


You could also use the library dplyr and case_when to code this like:您还可以使用库dplyrcase_when来编写如下代码:

library(dplyr)

df %>% 
  mutate(c4 = case_when( 
    startsWith(c1, "A10") ~ "Activity 1",
    startsWith(c1, "A12") ~ "Activity 2",
    startsWith(c1, "A20") ~ "Activity 3",
    T ~ "Other Activity"))

Output Output

     c1 c2  c3             c4
1 A1000  x 100     Activity 1
2 A1200  x 200     Activity 2
3 A3000  y 150 Other Activity
4 A2000  x 250     Activity 3
5 A3200  t 100 Other Activity
6 A1000  e 250     Activity 1
7 A1200  w 300     Activity 2

There are definitely better solutions out there but this one is the closest to your own.肯定有更好的解决方案,但这个最接近您自己的解决方案。 You first have to create an empty vector of type character with the same length as the number of rows in your data frame named c4 .您首先必须创建一个character类型的空向量,其长度与名为c4的数据框中的行数相同。 Then you iterate over the first column whose first three characters you would like to extract and then fill your c4 in every iteration with the right match.然后,您遍历要提取其前三个字符的第一列,然后在每次迭代中用正确的匹配填充您的c4

cat_x <- function(data_x){
  c4 <- vector("character", length = nrow(data_x))
  
  for(i in 1:nrow(data_x)) {
    if(substr(data_x[i, 1], star = 1, stop = 3) == "A10"){
      c4[[i]] <- "Activity 1"
    } else if(substr(data_x[i, 1], star = 1, stop = 3) == "A12") {
      c4[[i]] <- "Activity 2"
    } else if(substr(data_x[i, 1], star = 1, stop = 3) == "A20") {
      c4[[i]] <- "Activity 3"
    } else {
      c4[[i]] <- "Other activity"
    }
  }
  cbind(data_x, c4)
}

cat_x(df)

     c1 c2  c3             c4
1 A1000  x 100     Activity 1
2 A1200  x 200     Activity 2
3 A3000  y 150 Other activity
4 A2000  x 250     Activity 3
5 A3200  t 100 Other activity
6 A1000  e 250     Activity 1
7 A1200  w 300     Activity 2

Data数据

df <- read.table(header = TRUE, text = "
                 c1      c2  c3
A1000   x 100 
A1200   x 200 
A3000   y 150 
A2000   x 250 
A3200   t 100 
A1000   e 250 
A1200   w 300")

This is a standard merge operation.这是标准的合并操作。 First make your codes into a data frame and use dput to make them easily available:首先将您的代码放入数据框并使用dput使其易于使用:

data <- structure(list(c1 = c("A1000", "A1200", "A3000", "A2000", "A3200", 
    "A1000", "A1200"), c2 = c("x", "x", "y", "x", "t", "e", "w"), 
    c3 = c(100L, 200L, 150L, 250L, 100L, 250L, 300L)), class = "data.frame",
    row.names = c(NA, -7L))

codes <- structure(list(Code = c("A10", "A12", "A20"), Name = c("Activity 1", 
    "Activity 2", "Activity 3")), class = "data.frame", row.names = c(NA, -3L))

Now create a column in data that matches the code and merge:现在在data中创建一个与代码匹配的列并合并:

data$Code <- substr(data$c1, 1, 3)
data.mrg <- merge(data, codes, all=TRUE)
#   Code    c1 c2  c3       Name
# 1  A10 A1000  x 100 Activity 1
# 2  A10 A1000  e 250 Activity 1
# 3  A12 A1200  x 200 Activity 2
# 4  A12 A1200  w 300 Activity 2
# 5  A20 A2000  x 250 Activity 3
# 6  A30 A3000  y 150       <NA>
# 7  A32 A3200  t 100       <NA>

If you want to remove the Code column and rename Name to c4 :如果要删除Code列并将Name重命名为c4

data.mrg <- data.mrg[, -1]      # Optional to get rid of first column
colnames(data.mrg)[4] <- "c4".  # Optional to change column name
data.mrg
#      c1 c2  c3         c4
# 1 A1000  x 100 Activity 1
# 2 A1000  e 250 Activity 1
# 3 A1200  x 200 Activity 2
# 4 A1200  w 300 Activity 2
# 5 A2000  x 250 Activity 3
# 6 A3000  y 150       <NA>
# 7 A3200  t 100       <NA>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R-如何将新列添加到数据框中,这是另一列的计算结果 - R - How do I add a new column to a dataframe that is the calculated result of another column 将计算列添加到dataframe R. - Add calculated column to dataframe R R:根据另一个具有重复值的 dataframe 列中的值在 dataframe 中添加新列 - R: Add new column in dataframe based on values in another dataframe column with repetitive values 将条件计算列添加到R数据框 - Add Conditional Calculated Column to R Dataframe 根据 R 中 dataframe 的另一列的相等值,在新列(在第一个数据帧中)中添加值(来自第二个数据帧) - Add value (from 2nd dataframe) in new column (in 1st dataframe) based on equality value of another column from both dataframe in R R:在dataframe的另一列中按特定模式添加新列 - R: Add new column by specific patterns in another column of the dataframe 基于另一个在 dataframe 中创建新列,并与 R 中的另一个数据集匹配 - Create new column in dataframe based on another and matching to another dataset in R R - 使用另一个数据框的匹配值向数据框添加新列 - R - Add a new column to a dataframe using matching values of another dataframe R-根据原始数据和汇总df中的列将计算列添加到汇总数据框中 - R - Add a calculated column to a summarized dataframe based on raw data and column from summarized df 根据另一个 dataframe 的多个列向 dataframe 添加新列 - Add a new column to a dataframe based on multiple columns from another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM