[英]R regex to remove anything after second occurance of / in data frame column
I have data stored in a dataframe
column that looks like this: 我将数据存储在如下所示的
dataframe
列中:
/travel
/food and drink/restaurants
/food and drink
/sports/outdoors/climbing
/news
/family
Each row has some number of "/" but they always start with a "/". 每行都有一些“ /”,但它们始终以“ /”开头。 Some lines are also blank.
有些行也是空白的。 I just need to convert this data to only include the text after the first "/" but before the second "/".
我只需要将此数据转换为仅在第一个“ /”之后但在第二个“ /”之前包含文本。 I also want to capitalize the first letter of each word of the result.
我还想将结果中每个单词的首字母大写。 So I'd hope the result to look like this:
所以我希望结果看起来像这样:
Travel
Food And Drink
Food And Drink
Sports
News
Family
x <- c('/travel',
'/food and drink/restaurants',
'/food and drink',
'/sports/outdoors/climbing',
'/news',
'/family')
Upcase every word 大写每个字
gsub('(?<=\\b)([a-z])', '\\U\\1', x, perl = TRUE)
# [1] "/Travel" "/Food And Drink/Restaurants" "/Food And Drink"
# [4] "/Sports/Outdoors/Climbing" "/News" "/Family"
Extract the first /..
group 提取第一个
/..
组
gsub('^/([^/]+)|.', '\\1', x)
# [1] "travel" "food and drink" "food and drink" "sports" "news"
# [6] "family"
Combine the two 结合两者
gsub('(?<=\\b)([a-z])', '\\U\\1', gsub('^/([^/]+)|.', '\\1', x), perl = TRUE)
# [1] "Travel" "Food And Drink" "Food And Drink" "Sports" "News"
# [6] "Family"
If you don't care about the "and" being uppercase, you can use the second gsub
and tools::toTitleCase
如果您不关心大写的“和”,则可以使用第二个
gsub
和tools::toTitleCase
tools::toTitleCase(gsub('^/([^/]+)|.', '\\1', x))
# [1] "Travel" "Food and Drink" "Food and Drink" "Sports" "News"
# [6] "Family"
require(magrittr)
txt <- c("/travel", "/food and drink/restaurants", "/food and drink", "/sports/outdoors/climbing", "", "/news", "/family")
strsplit(txt, "/") %>% sapply( '[', 2 ) #per Frank's suggestion
## [1] "travel" "food and drink" "food and drink" "sports"
## [5] NA "news" "family"
A quick way would be the following: I'm assuming that there only words characters \\w
and white spaces \\s
in the part you want to collect. 一种快速的方法如下:我假设要收集的部分中只有单词字符
\\w
和空格\\s
。
char<- c("/travel","/food and drink/restaurants","/food and drink","/sports/outdoors/climbing","","/news","/family")
match <- regexpr("[\\w\\s]+",char,perl=TRUE)
regmatches(char,match)
## regmatches(char,match)
## [1] "travel" "food and drink" "food and drink" "sports"
## [5] "news" "family"
You would need to install the stringi
package (and you should probably have it anyway :) but the following should do the trick 您可能需要安装
stringi
软件包(无论如何您都应该拥有它:),但是以下方法可以解决问题
stringi::stri_trans_totitle( gsub("/([^/]+)", "\\1", data))
The gsub
simply picks out the text after the first /
up until either the second /
or the end of the string. gsub
只是在第一个/
后面直到第二个/
或字符串的末尾选择文本。 stringi::stri_trans_totitle
then does the case conversion for you. 然后
stringi::stri_trans_totitle
为您进行大小写转换。
> s <-c("/food and drink/restaurants", "/beer and wine", "", "/news")
> stringi::stri_trans_totitle( gsub("/([^/]+)", "\\1", s))
[1] "Food And Drinkrestaurants" "Beer And Wine"
[3] "" "News"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.