[英]replacing a value in column X based on columns Y with R
i've gone through several answers and tried the following but each either yields an error or an un-wanted result: 我已经经历了几个答案,并尝试了以下方法,但每个方法都会产生错误或不需要的结果:
here's the data: 这是数据:
Network Campaign
Moburst_Chartboost Test Campaign
Moburst_Chartboost Test Campaign
Moburst_Appnext unknown
Moburst_Appnext 1065
i'd like to replace "Test Campaign" with "1055" whenever "Network" == "Moburst_Chartboost". 每当“网络” ==“ Moburst_Chartboost”时,我想将“测试活动”替换为“ 1055”。 i realize this should be very simple but trying out these:
我意识到这应该非常简单,但可以尝试以下方法:
dataset = read.csv('C:/Users/User/Downloads/example.csv')
for( i in 1:nrow(dataset)){
if(dataset$Network == 'Moburst_Chartboost') dataset$Campaign <- '1055'
}
this yields an error: Warning messages: 这将产生一个错误:警告消息:
1: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
2: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
etc.
then i tried: 然后我尝试了:
within(dataset, {
dataset$Campaign <- ifelse(dataset$Network == 'Moburst_Chartboost', '1055', dataset$Campaign)
})
this turned ALL 4 values in row "Campaign" into "1055" over running what was there even when condition isn't met 这样,即使不满足条件,也可以通过运行“活动”行中的所有4个值将其变为“ 1055”
also this: 还有这个:
dataset$Campaign[which(dataset$Network == 'Moburst_Chartboost')] <- 1055
yields this error, and replaced the values in the two first rows of "Campaign" with NA: 会产生此错误,并将“广告系列”的第一两行中的值替换为NA:
Warning message:
In `[<-.factor`(`*tmp*`, which(dataset$Network == "Moburst_Chartboost"), :
invalid factor level, NA generated
scratching my head here. 在这里挠头。 new to R but this shouldn't be so hard :(
R的新手,但这不应该那么难:(
Try the following 尝试以下
dataset = read.csv('C:/Users/User/Downloads/example.csv', stringsAsFactors = F)
for( i in 1:nrow(dataset)){
if(dataset$Network[i] == 'Moburst_Chartboost') dataset$Campaign[i] <- '1055'
}
It seems your forgot the index variable. 看来您忘记了索引变量。 Without [i] you work on the whole vector of the data frame, resulting in the error/warning you mentioned.
如果没有[i],您将处理数据帧的整个向量,从而导致您提到的错误/警告。 Note that I added
stringsAsFactors = F
to the read.csv()
function to make sure the strings are indeed interpreted as strings and not factors. 请注意,我在
read.csv()
函数中添加了stringsAsFactors = F
以确保字符串确实被解释为字符串而不是因素。 Using factors this would result in an error like this 使用因素会导致这样的错误
In `[<-.factor`(`*tmp*`, i, value = c(NA, 2L, 3L, 1L)) :
invalid factor level, NA generated
Alternatively you can do the following without using a for
loop: 另外,您可以不使用
for
循环而执行以下操作:
idx <- which(dataset$Network == 'Moburst_Chartboost')
dataset$Campaign[idx] <- '1055'
Here, idx
is a vector containing the positions where Network
has the value 'Moburst_Chartboost'
在此,
idx
是一个矢量,其中包含Network
的值为'Moburst_Chartboost'
In your first attempt, you're trying to iterate over all the columns, when you only want to change the 2nd column. 在第一次尝试中,当您只想更改第二列时,您尝试遍历所有列。
In your second, you're trying to assign the value "1055" to all of the 2nd column. 在第二个步骤中,您尝试将值“ 1055”分配给所有第二列。
The way to think about it is as an if else, where if the condition in col 1 is met, col 2 is changed, otherwise it remains the same. 考虑它的方式就好像是其他情况一样,如果满足col 1中的条件,则col 2会更改,否则保持不变。
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", "1065"))
dataset$Campaign <- ifelse(dataset$Network == "Moburst_Chartboost",
"1055",
dataset$Campaign)
head(dataset)
Network Campaign
1 Moburst_Chartboost 1055
2 Moburst_Chartboost 1055
3 Moburst_Appnext unknown
4 Moburst_Appnext 1065
You may also try dataset$Campaign[dataset$Campaign=="Test Campaign"]<-1055
to avoid the use of loops and ifelse
statements. 您也可以尝试使用
dataset$Campaign[dataset$Campaign=="Test Campaign"]<-1055
ifelse
以避免使用循环和ifelse
语句。
Where dataset
dataset
在哪里
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", 1065))
thank you for the help! 感谢您的帮助! not elegant, but since this lingered with me when going to sleep last night i decided to try to bludgeon this with some ugly code but it worked too - just as a workaround...separated to two data frames, replaced all values and then binded back...
不是很优雅,但是由于昨晚睡觉时这困扰我,我决定尝试用一些丑陋的代码来解决这个问题,但是它也可以工作-就像一个解决方法...分离为两个数据帧,替换所有值然后绑定背部...
# subsetting only chartboost
chartboost <- subset(dataset, dataset$Network=='Moburst_Chartboost')
# replace all values in Campaign
chartboost$Campaign <-sub("^.*", "1055",chartboost$Campaign)
#subsetting only "not chartboost"
notChartboost <-subset(dataset, dataset$Network!='Moburst_Chartboost')
# binding back to single dataframe
newSet <- rbind(chartboost, notChartboost)
Ugly as a duckling but worked :) 丑小鸭,但工作:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.