[英]R split one column based on another
I want to split a column based on another. 我想基于另一个拆分列。 I explain in the following. 我在下面解释。
here is part of my data: 这是我的数据的一部分:
brand products
APPLE IPHONE6SPlus_16G
APPLE IPHONE6S_64G
APPLE IPHONE6S_16G
APPLE IPhone6_32G
APPLE iPadAir2_64G
APPLE iPadmini2_16G
APPLE iPadmini4_64G
HTC ONEX
Samsung SamsungGalaxy
I want to split brand
based on Products
. 我想根据Products
拆分brand
。 here is what I actually want. 这是我真正想要的。
brand products
iPhone6S IPHONE6SPlus_16G
iPhone6S IPHONE6S_64G
iPhone6S IPHONE6S_16G
iPhone6 IPhone6_32G
APPLE iPadAir2_64G
APPLE iPadmini2_16G
APPLE iPadmini4_64G
HTC ONEX
Samsung SamsungGalaxy
I just want to split APPLE
into three new(APPLE, iPhone6S, iPhone6) based on products
. 我只想将APPLE
基于products
分为三个新的(APPLE,iPhone6S,iPhone6)。 If the name in products
contains IPHONE6SPlus , IPHONE6S , change brand
to iPhone6S. 如果products
中的名称包含IPHONE6SPlus , IPHONE6S ,则将brand
更改为iPhone6S。 If the name in products
contains IPhone6 , change brand
to iPhone6. 如果products
中的名称包含IPhone6 ,则将brand
更改为iPhone6。 And the remainings do not change. 其余的不会改变。
I think I can use iflese
to do, but there are size (ie 16G, 64G, etc.) in products
name. 我想我可以使用iflese
来做,但是products
名称中有大小 (即16G,64G等)。
How can I ignore these size and split the data. 如何忽略这些大小并拆分数据。
We can do this using a couple of methods. 我们可以使用两种方法来做到这一点。 Here, is one with sub
and ==
在这里,是一个带有sub
和==
v1 <- sub("^(.)(.)(.{5})(.).*", "\\L\\1\\U\\2\\L\\3\\U\\4", df1$products, perl = TRUE)
df1$brand[v1=="iPhone6S"] <- v1[v1 == "iPhone6S"]
df1
# brand products
#1 iPhone6S IPHONE6SPlus_16G
#2 iPhone6S IPHONE6S_64G
#3 iPhone6S IPHONE6S_16G
#4 APPLE IPhone6_32G
#5 APPLE iPadAir2_64G
#6 APPLE iPadmini2_16G
#7 APPLE iPadmini4_64G
#8 HTC ONEX
#9 Samsung SamsungGalaxy
The sub
matches the pattern
of first element capture as a group ( (.)
) from the beginning of the string ( ^
), followed by next element as another group, next 5 elements as third group ( (.{5})
), followed by another element as a group and the rest of the elements ( .*
). 所述sub
的匹配pattern
第一元件捕获的作为一个组( (.)
从字符串(的开头) ^
),接着作为另一基团的下一个元素,下一个5种元素作为第三组( (.{5})
其次是另一个元素作为组,其余元素( .*
)。 In the replacement, we either change the case to lower ( \\\\L
) or upper ( \\\\U
) for the backreference of those groups ( \\\\1
) 在替换中,我们将大小写更改为小写( \\\\L
)或大写( \\\\U
),以用于这些组的后向引用( \\\\1
)
Or an easier option is with grepl
或者更简单的选择是使用grepl
df1$brand[grepl("IPHONE6S", df1$products)] <- "iPhone6S"
If the column have both lower and upper case characters, then it can be converted to either one of them using tolower
or toupper
and then do the processing 如果该列同时具有大写和小写字符,则可以使用tolower
或toupper
将其转换为其中之一,然后进行处理
df1$brand[grepl("IPHONE6S", toupper(df1$products))] <- "iPhone6S"
Suppose we want to change multiple elements, this can be done with looping 假设我们要更改多个元素,可以通过循环来完成
nm1 <- c("IPAD", "IPHONE", "SAMSUNG")
for(j in nm1) df1$brand[grepl(j, toupper(df1$products))] <- j
df1
# brand products
#1 IPHONE IPHONE6SPlus_16G
#2 IPHONE IPHONE6S_64G
#3 IPHONE IPHONE6S_16G
#4 IPHONE IPhone6_32G
#5 IPAD iPadAir2_64G
#6 IPAD iPadmini2_16G
#7 IPAD iPadmini4_64G
#8 HTC ONEX
#9 SAMSUNG SamsungGalaxy
'Dirty' solution but I hope it helps :) “肮脏”的解决方案,但我希望它能有所帮助:)
x <- c('IPHONE6SPlus','IPHONE6S')
b$new <- grepl(paste(x, collapse = "|"), b$products)
b$brand[b$new==TRUE] <- "Iphone6S"
b$new <- NULL
y <- c('IPhone6')
b$new <- grepl(paste(y, collapse = "|"), b$products)
b$brand[b$new==TRUE] <- "Iphone6"
b$new <- NULL
brand products
1 Iphone6S IPHONE6SPlus_16G
2 Iphone6S IPHONE6S_64G
3 Iphone6S IPHONE6S_16G
4 Iphone6 IPhone6_32G
5 APPLE iPadAir2_64G
6 APPLE iPadmini2_16G
7 APPLE iPadmini4_64G
8 HTC ONEX
9 Samsung SamsungGalaxy
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.