根据其他值添加列

Question

I have a dataframe with millions of rows and three columns labeled Keywords, Impressions, Clicks. 我有一个包含数百万行和三列标记为关键字，展示次数，点击次数的数据框。 I'd like to add a column with values depending on the evaluation of this function: 我想根据此函数的评估添加一个包含值的列：

isType <- function(Impressions, Clicks)
{ 
if (Impressions >= 1 & Clicks >= 1){return("HasClicks")} else if (Impressions >=1 & Clicks == 0){return("NoClicks")} else {return("ZeroImp")}
}

so far so good. 到现在为止还挺好。 I then try this to create the column but 1) it takes for ever and 2) it marks all the rows has "HasClicks" even the ones where it shouldn't. 然后我尝试这个来创建列，但1）它需要永远和2）它标记所有行有“HasClicks”甚至是它不应该的那些。

# Creates a dataframe
Type <- data.frame()
# Loops until last row and store it in data.frame
for (i in c(1:dim(Mydf)[1])) {Type <- rbind(Type,isType(Mydf$Impressions[i], Mydf$Clicks[i]))}
# Add the column to Mydf
Mydf <- transform(Mydf, Type = Type)

input data: 输入数据：

Keywords,Impressions,Clicks 关键字，展示次数，点击次数
"Hello",0,0 “你好”，0,0
"World",1,0 “世界”，1,0
"R",34,23 “R”，34,23

Wanted output: 通缉输出：

Keywords,Impressions,Clicks,Type 关键字，展示次数，点击次数，类型
"Hello",0,0,"ZeroImp" “你好”，0,0， “ZeroImp”
"World",1,0,"NoClicks" “世界”，1,0， “NoClicks”
"R",34,23,"HasClicks" “R”，34,23， “HasClicks”

Answer 1

Building on Joshua's solution, I find it cleaner to generate Type in a single shot (note however that this presumes Clicks >= 0...) 在Joshua的解决方案的基础上，我发现在单次拍摄中生成Type更加清晰（请注意，这假设Clicks> = 0 ......）

Mydf$Type = ifelse(Mydf$Impressions >= 1,
    ifelse(Mydf$Clicks >= 1, 'HasClicks', 'NoClicks'), 'ZeroImp')

Answer 2

First, the if/else block in your function will return the warning: 首先，函数中的if / else块将返回警告：

Warning message: 警告信息：
In if (1:2 > 2:3) TRUE else FALSE : 在if（1：2> 2：3）中为TRUE否则为FALSE：
the condition has length > 1 and only the first element will be used 条件的长度> 1，只使用第一个元素

which explains why it all the rows are the same. 这解释了为什么所有的行都是一样的。

Second, you should allocate your data.frame and fill in the elements rather than repeatedly combining objects together. 其次，您应该分配data.frame并填充元素，而不是重复组合对象。 I imagine this is causing your long run-times. 我想这会导致你的长时间运行。

EDIT: My shared code. 编辑：我的共享代码。 I'd love for someone to provide a more elegant solution. 我喜欢有人提供更优雅的解决方案。

Mydf <- data.frame(
  Keywords = sample(c("Hello","World","R"),20,TRUE),
  Impressions = sample(0:3,20,TRUE),
  Clicks = sample(0:3,20,TRUE) )

Mydf$Type <- "ZeroImp"
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks >= 1,
  "HasClicks", Mydf$Type)
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks == 0,
  "NoClicks", Mydf$Type)

Answer 3

This is a case where arithmetic can be cleaner and most likely faster than nested ifelse statements. 这种情况下算术可以更清晰，并且最有可能比嵌套的ifelse语句更快。

Again building on Joshua's solution: 再次以Joshua的解决方案为基础：

Mydf$Type <- factor(with(Mydf, (Impressions>=1)*2 + (Clicks>=1)*1),
                    levels=1:3, labels=c("ZeroImp","NoClicks","HasClicks"))

根据其他值添加列

问题描述

3 个解决方案

解决方案1
10 已采纳 2010-10-13 00:03:04

解决方案2
3 2010-10-12 23:23:02

解决方案3
0 2011-03-08 18:40:43

根据其他值添加列

问题描述

3 个解决方案

解决方案1 10 已采纳 2010-10-13 00:03:04

解决方案2 3 2010-10-12 23:23:02

解决方案3 0 2011-03-08 18:40:43

解决方案1
10 已采纳 2010-10-13 00:03:04

解决方案2
3 2010-10-12 23:23:02

解决方案3
0 2011-03-08 18:40:43