[英]Create a new data frame column based on the values of another column
Let's say I have the following data frame.假设我有以下数据框。
dat <- data.frame(city=c("Chelsea","Brent","Bremen","Olathe","Lenexa","Shawnee"),
tag=c(rep("AlabamaCity",3), rep("KansasCity",3)))
I want to include a third column, Tag2, which will be the region that each state is in from the Tag column.我想包括第三列,Tag2,它将是每个 state 来自 Tag 列的区域。 So the first three cities will end up as 'South' and the last three will be 'Midwest'.
因此,前三个城市将最终成为“南部”,后三个城市将成为“中西部”。 The data will look like.
数据看起来像。
city tag tag2
1 Chelsea AlabamaCity South
2 Brent AlabamaCity South
3 Bremen AlabamaCity South
4 Olathe KansasCity Midwest
5 Lenexa KansasCity Midwest
6 Shawnee KansasCity Midwest
I tried the following commands, but it doesn't create a new column.我尝试了以下命令,但它没有创建新列。 Can anyone tell me what's wrong.
谁能告诉我怎么了。
fixit <- function(dat) {
for (i in 1:nrow(dat)) {
Words = strsplit(as.character(dat[i, 'tag']), " ")[[1]]
if(any(Words == 'Alabama')) {
dat[i, 'tag2'] <- "South"
}
if(any(Words == 'Kansas')) {
dat[i, 'tag2'] <- "Midwest"
}
}
return(dat)
}
Thanks for the help.谢谢您的帮助。
It isn't working because your strsplit()
to create Words
is wrong.它不起作用,因为您创建
Words
的strsplit()
是错误的。 (You do know how to debug R function's don't you?) (你知道如何调试 R 函数不是吗?)
debug: Words = strsplit(as.character(dat[i, "tag"]), " ")[[1]]
Browse[2]>
debug: if (any(Words == "Alabama")) {
dat[i, "Tag2"] <- "South"
}
Browse[2]> Words
[1] "AlabamaCity"
at this point, Words
is certainly not equal to "Alabama"
or "Kansas"
and will never be, so the if()
clauses never get executed.在这一点上,
Words
肯定不等于"Alabama"
或"Kansas"
,而且永远不会,所以if()
子句永远不会被执行。 R is returning dat
, it is your function that is not altering dat
. R正在返回
dat
,您的 function 没有改变dat
。
This will do it for you, and is a bit more generic.这将为您完成,并且更通用。 First create a data frame holding the matched words with the regions
首先创建一个数据框,其中包含与区域匹配的单词
region <- data.frame(tag = c("Alabama","Kansas"), tag2 = c("South","Midwest"),
stringsAsFactors = FALSE)
The loop over the rows of this data frame, matching the "tag"
s and inserting the appropriate "tag2"
s:在此数据帧的行上循环,匹配
"tag"
并插入适当的"tag2"
:
for(i in seq_len(nrow(region))) {
want <- grepl(region[i, "tag"], dat[, "tag"])
dat[want, "tag2"] <- region[i, "tag2"]
}
Which will result in this:这将导致:
> dat
city tag tag2
1 Chelsea AlabamaCity South
2 Brent AlabamaCity South
3 Bremen AlabamaCity South
4 Olathe KansasCity Midwest
5 Lenexa KansasCity Midwest
6 Shawnee KansasCity Midwest
How does this work?这是如何运作的? The key bit is
grepl()
.关键位是
grepl()
。 If we do this for just one match, "Alabama"
, grepl()
is used like this:如果我们只为一场比赛执行此操作,
"Alabama"
, grepl()
的使用方式如下:
grepl("Alabama", dat[, "tag"])
and returns a logical indicating which of the "tag"
elements matched the string "Alabama":并返回一个逻辑,指示哪些
"tag"
元素与字符串“阿拉巴马”匹配:
> grepl("Alabama", dat[, "tag"])
[1] TRUE TRUE TRUE FALSE FALSE FALSE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.