简体   繁体   中英

replacing a value in column X based on columns Y with R

i've gone through several answers and tried the following but each either yields an error or an un-wanted result:

here's the data:

Network                 Campaign
Moburst_Chartboost      Test Campaign
Moburst_Chartboost      Test Campaign 
Moburst_Appnext         unknown
Moburst_Appnext         1065

i'd like to replace "Test Campaign" with "1055" whenever "Network" == "Moburst_Chartboost". i realize this should be very simple but trying out these:

dataset = read.csv('C:/Users/User/Downloads/example.csv')
for( i in 1:nrow(dataset)){
  if(dataset$Network == 'Moburst_Chartboost') dataset$Campaign <- '1055'
}

this yields an error: Warning messages:

1: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
  the condition has length > 1 and only the first element will be used
2: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
  the condition has length > 1 and only the first element will be used
etc.

then i tried:

within(dataset, {
  dataset$Campaign <- ifelse(dataset$Network == 'Moburst_Chartboost', '1055', dataset$Campaign)
})

this turned ALL 4 values in row "Campaign" into "1055" over running what was there even when condition isn't met

also this:

dataset$Campaign[which(dataset$Network == 'Moburst_Chartboost')] <- 1055

yields this error, and replaced the values in the two first rows of "Campaign" with NA:

Warning message:
In `[<-.factor`(`*tmp*`, which(dataset$Network == "Moburst_Chartboost"),  :
  invalid factor level, NA generated

scratching my head here. new to R but this shouldn't be so hard :(

Try the following

dataset = read.csv('C:/Users/User/Downloads/example.csv', stringsAsFactors = F)
for( i in 1:nrow(dataset)){
  if(dataset$Network[i] == 'Moburst_Chartboost') dataset$Campaign[i] <- '1055'
}

It seems your forgot the index variable. Without [i] you work on the whole vector of the data frame, resulting in the error/warning you mentioned. Note that I added stringsAsFactors = F to the read.csv() function to make sure the strings are indeed interpreted as strings and not factors. Using factors this would result in an error like this

In `[<-.factor`(`*tmp*`, i, value = c(NA, 2L, 3L, 1L)) :
invalid factor level, NA generated

Alternatively you can do the following without using a for loop:

idx <- which(dataset$Network == 'Moburst_Chartboost')
dataset$Campaign[idx] <- '1055'

Here, idx is a vector containing the positions where Network has the value 'Moburst_Chartboost'

In your first attempt, you're trying to iterate over all the columns, when you only want to change the 2nd column.

In your second, you're trying to assign the value "1055" to all of the 2nd column.

The way to think about it is as an if else, where if the condition in col 1 is met, col 2 is changed, otherwise it remains the same.

dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost", 
                              "Moburst_Appnext", "Moburst_Appnext"),
                  Campaign = c("Test Campaign", "Test Campaign",
                               "unknown", "1065"))

dataset$Campaign <- ifelse(dataset$Network == "Moburst_Chartboost",
                       "1055",
                       dataset$Campaign)

head(dataset)
Network Campaign
1 Moburst_Chartboost     1055
2 Moburst_Chartboost     1055
3    Moburst_Appnext  unknown
4    Moburst_Appnext     1065

You may also try dataset$Campaign[dataset$Campaign=="Test Campaign"]<-1055 to avoid the use of loops and ifelse statements.

Where dataset

dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost", 
                              "Moburst_Appnext", "Moburst_Appnext"),
                  Campaign = c("Test Campaign", "Test Campaign",
                               "unknown", 1065))

thank you for the help! not elegant, but since this lingered with me when going to sleep last night i decided to try to bludgeon this with some ugly code but it worked too - just as a workaround...separated to two data frames, replaced all values and then binded back...

# subsetting only chartboost    
chartboost <- subset(dataset, dataset$Network=='Moburst_Chartboost')
# replace all values in Campaign
chartboost$Campaign <-sub("^.*", "1055",chartboost$Campaign)
#subsetting only "not chartboost"
notChartboost <-subset(dataset, dataset$Network!='Moburst_Chartboost')
# binding back to single dataframe
newSet <- rbind(chartboost, notChartboost)

Ugly as a duckling but worked :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM