[英]adding a column based on other values
I have a dataframe with millions of rows and three columns labeled Keywords, Impressions, Clicks. 我有一个包含数百万行和三列标记为关键字,展示次数,点击次数的数据框。 I'd like to add a column with values depending on the evaluation of this function:
我想根据此函数的评估添加一个包含值的列:
isType <- function(Impressions, Clicks)
{
if (Impressions >= 1 & Clicks >= 1){return("HasClicks")} else if (Impressions >=1 & Clicks == 0){return("NoClicks")} else {return("ZeroImp")}
}
so far so good. 到现在为止还挺好。 I then try this to create the column but 1) it takes for ever and 2) it marks all the rows has "HasClicks" even the ones where it shouldn't.
然后我尝试这个来创建列,但1)它需要永远和2)它标记所有行有“HasClicks”甚至是它不应该的那些。
# Creates a dataframe
Type <- data.frame()
# Loops until last row and store it in data.frame
for (i in c(1:dim(Mydf)[1])) {Type <- rbind(Type,isType(Mydf$Impressions[i], Mydf$Clicks[i]))}
# Add the column to Mydf
Mydf <- transform(Mydf, Type = Type)
input data: 输入数据:
Keywords,Impressions,Clicks 关键字,展示次数,点击次数
"Hello",0,0 “你好”,0,0
"World",1,0 “世界”,1,0
"R",34,23 “R”,34,23
Wanted output: 通缉输出:
Keywords,Impressions,Clicks,Type 关键字,展示次数,点击次数,类型
"Hello",0,0,"ZeroImp" “你好”,0,0, “ZeroImp”
"World",1,0,"NoClicks" “世界”,1,0, “NoClicks”
"R",34,23,"HasClicks" “R”,34,23, “HasClicks”
Building on Joshua's solution, I find it cleaner to generate Type in a single shot (note however that this presumes Clicks >= 0...) 在Joshua的解决方案的基础上,我发现在单次拍摄中生成Type更加清晰(请注意,这假设Clicks> = 0 ......)
Mydf$Type = ifelse(Mydf$Impressions >= 1,
ifelse(Mydf$Clicks >= 1, 'HasClicks', 'NoClicks'), 'ZeroImp')
First, the if/else block in your function will return the warning: 首先,函数中的if / else块将返回警告:
Warning message:
警告信息:
In if (1:2 > 2:3) TRUE else FALSE :在if(1:2> 2:3)中为TRUE否则为FALSE:
the condition has length > 1 and only the first element will be used条件的长度> 1,只使用第一个元素
which explains why it all the rows are the same. 这解释了为什么所有的行都是一样的。
Second, you should allocate your data.frame and fill in the elements rather than repeatedly combining objects together. 其次,您应该分配data.frame并填充元素,而不是重复组合对象。 I imagine this is causing your long run-times.
我想这会导致你的长时间运行。
EDIT: My shared code. 编辑:我的共享代码。 I'd love for someone to provide a more elegant solution.
我喜欢有人提供更优雅的解决方案。
Mydf <- data.frame(
Keywords = sample(c("Hello","World","R"),20,TRUE),
Impressions = sample(0:3,20,TRUE),
Clicks = sample(0:3,20,TRUE) )
Mydf$Type <- "ZeroImp"
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks >= 1,
"HasClicks", Mydf$Type)
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks == 0,
"NoClicks", Mydf$Type)
This is a case where arithmetic can be cleaner and most likely faster than nested ifelse
statements. 这种情况下算术可以更清晰,并且最有可能比嵌套的
ifelse
语句更快。
Again building on Joshua's solution: 再次以Joshua的解决方案为基础:
Mydf$Type <- factor(with(Mydf, (Impressions>=1)*2 + (Clicks>=1)*1),
levels=1:3, labels=c("ZeroImp","NoClicks","HasClicks"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.