根据另一列的值创建一个新的数据框列

Question

Let's say I have the following data frame.假设我有以下数据框。

dat <- data.frame(city=c("Chelsea","Brent","Bremen","Olathe","Lenexa","Shawnee"), 
        tag=c(rep("AlabamaCity",3), rep("KansasCity",3)))

I want to include a third column, Tag2, which will be the region that each state is in from the Tag column.我想包括第三列，Tag2，它将是每个 state 来自 Tag 列的区域。 So the first three cities will end up as 'South' and the last three will be 'Midwest'.因此，前三个城市将最终成为“南部”，后三个城市将成为“中西部”。 The data will look like.数据看起来像。

     city         tag      tag2
1 Chelsea AlabamaCity    South
2   Brent AlabamaCity    South
3  Bremen AlabamaCity    South
4  Olathe  KansasCity    Midwest
5  Lenexa  KansasCity    Midwest
6 Shawnee  KansasCity    Midwest

I tried the following commands, but it doesn't create a new column.我尝试了以下命令，但它没有创建新列。 Can anyone tell me what's wrong.谁能告诉我怎么了。

fixit <- function(dat) {
     for (i in 1:nrow(dat)) {
          Words = strsplit(as.character(dat[i, 'tag']), " ")[[1]]
          if(any(Words == 'Alabama')) {
                dat[i, 'tag2'] <- "South"
          }
          if(any(Words == 'Kansas')) {
                dat[i, 'tag2'] <- "Midwest"
          }
     }
     return(dat)
}

Thanks for the help.谢谢您的帮助。

Answer 1

It isn't working because your strsplit() to create Words is wrong.它不起作用，因为您创建Words的strsplit()是错误的。 (You do know how to debug R function's don't you?) （你知道如何调试 R 函数不是吗？）

debug: Words = strsplit(as.character(dat[i, "tag"]), " ")[[1]]
Browse[2]> 
debug: if (any(Words == "Alabama")) {
    dat[i, "Tag2"] <- "South"
}
Browse[2]> Words
[1] "AlabamaCity"

at this point, Words is certainly not equal to "Alabama" or "Kansas" and will never be, so the if() clauses never get executed.在这一点上， Words肯定不等于"Alabama"或"Kansas" ，而且永远不会，所以if()子句永远不会被执行。 R is returning dat , it is your function that is not altering dat . R正在返回dat ，您的 function 没有改变dat 。

This will do it for you, and is a bit more generic.这将为您完成，并且更通用。 First create a data frame holding the matched words with the regions首先创建一个数据框，其中包含与区域匹配的单词

region <- data.frame(tag = c("Alabama","Kansas"), tag2 = c("South","Midwest"),
                     stringsAsFactors = FALSE)

The loop over the rows of this data frame, matching the "tag" s and inserting the appropriate "tag2" s:在此数据帧的行上循环，匹配"tag"并插入适当的"tag2" ：

for(i in seq_len(nrow(region))) {
    want <- grepl(region[i, "tag"], dat[, "tag"])
    dat[want, "tag2"] <- region[i, "tag2"]
}

Which will result in this:这将导致：

> dat
     city         tag    tag2
1 Chelsea AlabamaCity   South
2   Brent AlabamaCity   South
3  Bremen AlabamaCity   South
4  Olathe  KansasCity Midwest
5  Lenexa  KansasCity Midwest
6 Shawnee  KansasCity Midwest

How does this work?这是如何运作的？ The key bit is grepl() .关键位是grepl() 。 If we do this for just one match, "Alabama" , grepl() is used like this:如果我们只为一场比赛执行此操作， "Alabama" ， grepl()的使用方式如下：

grepl("Alabama", dat[, "tag"])

and returns a logical indicating which of the "tag" elements matched the string "Alabama":并返回一个逻辑，指示哪些"tag"元素与字符串“阿拉巴马”匹配：

> grepl("Alabama", dat[, "tag"])
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

根据另一列的值创建一个新的数据框列

问题描述

1 个解决方案

解决方案1
3 已采纳 2011-07-06 16:27:01

根据另一列的值创建一个新的数据框列

问题描述

1 个解决方案

解决方案1 3 已采纳 2011-07-06 16:27:01

解决方案1
3 已采纳 2011-07-06 16:27:01