I want to split a column based on another. I explain in the following.
here is part of my data:
brand products
APPLE IPHONE6SPlus_16G
APPLE IPHONE6S_64G
APPLE IPHONE6S_16G
APPLE IPhone6_32G
APPLE iPadAir2_64G
APPLE iPadmini2_16G
APPLE iPadmini4_64G
HTC ONEX
Samsung SamsungGalaxy
I want to split brand
based on Products
. here is what I actually want.
brand products
iPhone6S IPHONE6SPlus_16G
iPhone6S IPHONE6S_64G
iPhone6S IPHONE6S_16G
iPhone6 IPhone6_32G
APPLE iPadAir2_64G
APPLE iPadmini2_16G
APPLE iPadmini4_64G
HTC ONEX
Samsung SamsungGalaxy
I just want to split APPLE
into three new(APPLE, iPhone6S, iPhone6) based on products
. If the name in products
contains IPHONE6SPlus , IPHONE6S , change brand
to iPhone6S. If the name in products
contains IPhone6 , change brand
to iPhone6. And the remainings do not change.
I think I can use iflese
to do, but there are size (ie 16G, 64G, etc.) in products
name.
How can I ignore these size and split the data.
We can do this using a couple of methods. Here, is one with sub
and ==
v1 <- sub("^(.)(.)(.{5})(.).*", "\\L\\1\\U\\2\\L\\3\\U\\4", df1$products, perl = TRUE)
df1$brand[v1=="iPhone6S"] <- v1[v1 == "iPhone6S"]
df1
# brand products
#1 iPhone6S IPHONE6SPlus_16G
#2 iPhone6S IPHONE6S_64G
#3 iPhone6S IPHONE6S_16G
#4 APPLE IPhone6_32G
#5 APPLE iPadAir2_64G
#6 APPLE iPadmini2_16G
#7 APPLE iPadmini4_64G
#8 HTC ONEX
#9 Samsung SamsungGalaxy
The sub
matches the pattern
of first element capture as a group ( (.)
) from the beginning of the string ( ^
), followed by next element as another group, next 5 elements as third group ( (.{5})
), followed by another element as a group and the rest of the elements ( .*
). In the replacement, we either change the case to lower ( \\\\L
) or upper ( \\\\U
) for the backreference of those groups ( \\\\1
)
Or an easier option is with grepl
df1$brand[grepl("IPHONE6S", df1$products)] <- "iPhone6S"
If the column have both lower and upper case characters, then it can be converted to either one of them using tolower
or toupper
and then do the processing
df1$brand[grepl("IPHONE6S", toupper(df1$products))] <- "iPhone6S"
Suppose we want to change multiple elements, this can be done with looping
nm1 <- c("IPAD", "IPHONE", "SAMSUNG")
for(j in nm1) df1$brand[grepl(j, toupper(df1$products))] <- j
df1
# brand products
#1 IPHONE IPHONE6SPlus_16G
#2 IPHONE IPHONE6S_64G
#3 IPHONE IPHONE6S_16G
#4 IPHONE IPhone6_32G
#5 IPAD iPadAir2_64G
#6 IPAD iPadmini2_16G
#7 IPAD iPadmini4_64G
#8 HTC ONEX
#9 SAMSUNG SamsungGalaxy
'Dirty' solution but I hope it helps :)
x <- c('IPHONE6SPlus','IPHONE6S')
b$new <- grepl(paste(x, collapse = "|"), b$products)
b$brand[b$new==TRUE] <- "Iphone6S"
b$new <- NULL
y <- c('IPhone6')
b$new <- grepl(paste(y, collapse = "|"), b$products)
b$brand[b$new==TRUE] <- "Iphone6"
b$new <- NULL
brand products
1 Iphone6S IPHONE6SPlus_16G
2 Iphone6S IPHONE6S_64G
3 Iphone6S IPHONE6S_16G
4 Iphone6 IPhone6_32G
5 APPLE iPadAir2_64G
6 APPLE iPadmini2_16G
7 APPLE iPadmini4_64G
8 HTC ONEX
9 Samsung SamsungGalaxy
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.