简体   繁体   English

根据其他值添加列

[英]adding a column based on other values

I have a dataframe with millions of rows and three columns labeled Keywords, Impressions, Clicks. 我有一个包含数百万行和三列标记为关键字,展示次数,点击次数的数据框。 I'd like to add a column with values depending on the evaluation of this function: 我想根据此函数的评估添加一个包含值的列:

isType <- function(Impressions, Clicks)
{ 
if (Impressions >= 1 & Clicks >= 1){return("HasClicks")} else if (Impressions >=1 & Clicks == 0){return("NoClicks")} else {return("ZeroImp")}
}

so far so good. 到现在为止还挺好。 I then try this to create the column but 1) it takes for ever and 2) it marks all the rows has "HasClicks" even the ones where it shouldn't. 然后我尝试这个来创建列,但1)它需要永远和2)它标记所有行有“HasClicks”甚至是它不应该的那些。

# Creates a dataframe
Type <- data.frame()
# Loops until last row and store it in data.frame
for (i in c(1:dim(Mydf)[1])) {Type <- rbind(Type,isType(Mydf$Impressions[i], Mydf$Clicks[i]))}
# Add the column to Mydf
Mydf <- transform(Mydf, Type = Type)

input data: 输入数据:

Keywords,Impressions,Clicks 关键字,展示次数,点击次数
"Hello",0,0 “你好”,0,0
"World",1,0 “世界”,1,0
"R",34,23 “R”,34,23

Wanted output: 通缉输出:

Keywords,Impressions,Clicks,Type 关键字,展示次数,点击次数,类型
"Hello",0,0,"ZeroImp" “你好”,0,0, “ZeroImp”
"World",1,0,"NoClicks" “世界”,1,0, “NoClicks”
"R",34,23,"HasClicks" “R”,34,23, “HasClicks”

Building on Joshua's solution, I find it cleaner to generate Type in a single shot (note however that this presumes Clicks >= 0...) 在Joshua的解决方案的基础上,我发现在单次拍摄中生成Type更加清晰(请注意,这假设Clicks> = 0 ......)

Mydf$Type = ifelse(Mydf$Impressions >= 1,
    ifelse(Mydf$Clicks >= 1, 'HasClicks', 'NoClicks'), 'ZeroImp')

First, the if/else block in your function will return the warning: 首先,函数中的if / else块将返回警告:

Warning message: 警告信息:
In if (1:2 > 2:3) TRUE else FALSE : 在if(1:2> 2:3)中为TRUE否则为FALSE:
the condition has length > 1 and only the first element will be used 条件的长度> 1,只使用第一个元素

which explains why it all the rows are the same. 这解释了为什么所有的行都是一样的。

Second, you should allocate your data.frame and fill in the elements rather than repeatedly combining objects together. 其次,您应该分配data.frame并填充元素,而不是重复组合对象。 I imagine this is causing your long run-times. 我想这会导致你的长时间运行。

EDIT: My shared code. 编辑:我的共享代码。 I'd love for someone to provide a more elegant solution. 我喜欢有人提供更优雅的解决方案。

Mydf <- data.frame(
  Keywords = sample(c("Hello","World","R"),20,TRUE),
  Impressions = sample(0:3,20,TRUE),
  Clicks = sample(0:3,20,TRUE) )

Mydf$Type <- "ZeroImp"
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks >= 1,
  "HasClicks", Mydf$Type)
Mydf$Type <- ifelse(Mydf$Impressions >= 1 & Mydf$Clicks == 0,
  "NoClicks", Mydf$Type)

This is a case where arithmetic can be cleaner and most likely faster than nested ifelse statements. 这种情况下算术可以更清晰,并且最有可能比嵌套的ifelse语句更快。

Again building on Joshua's solution: 再次以Joshua的解决方案为基础:

Mydf$Type <- factor(with(Mydf, (Impressions>=1)*2 + (Clicks>=1)*1),
                    levels=1:3, labels=c("ZeroImp","NoClicks","HasClicks"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM