[英]How to subset row values by string label in a single column in R?
I have a column that I would like to subset its row value based on the first and last 'string' label in R. The level values are as followed:我有一个列,我想根据 R 中的第一个和最后一个“字符串”标签对其行值进行子集化。级别值如下:
[1] "60022 (Location; 9TH FLOOR; Snacks)"
[3] "60024 (Location; 9TH FLOOR; Lg Snacks)"
[5] "60027 (Location; 9TH FLOOR; Sml Snacks)"
I would like the to pull the # and the last string separated by the ';'.我想拉出# 和最后一个字符串,以';' 分隔。 Is there a function or syntax in R to do this? R 中是否有函数或语法来执行此操作? So remove "Location; 9TH FLOOR" and just keep the last ;所以删除“Location; 9TH FLOOR”并保留最后一个; "" string. ““ 细绳。
I have tried this to pull just the first value but am unable to keep the "snacks" as well with this code:我已经尝试过只提取第一个值,但无法使用以下代码保留“零食”:
#updated_df_2020$Machine <- sub("([A-Za-z]+).*", "\\1", updated_df_2020$Machine)
End result for each row should be the number (60022 and then Snacks) like so:每行的最终结果应该是数字(60022,然后是零食),如下所示:
[1] "60022 (Snacks)"
[1] "60024 (Lg Snacks)"
[1] "60027 (Sml Snacks)"
If we need to remove the substring, capture the digits ( \\\\d+
) at the start ( ^
) of the string, and then capture the non white space ( \\\\S
) that succeeds the ;
如果我们需要删除子字符串,请捕获字符串开头( ^
)处的数字( \\\\d+
),然后捕获 ; 之后的非空格( \\\\S
) ;
and zero or more space ( \\\\s*
) and other characters that follows ( .*
) till the )
at the end ( $
) as second capture group.和零个或多个空格 ( \\\\s*
) 和后面 ( .*
) 直到)
末尾的其他字符 ( $
) 作为第二个捕获组。 In the replacement, specify the backreference ( \\\\1
, \\\\2
) of the captured group and modify it by adding the (
在替换中,指定捕获组的反向引用 ( \\\\1
, \\\\2
) 并通过添加(
updated_df_2020$Machine <- sub("^(\\d+)\\b.*;\\s*\\b(\\S.*\\))$",
"\\1 (\\2", updated_df_2020$Machine)
updated_df_2020$Machine
#[1] "60022 (Snacks)" "60024 (Lg Snacks)" "60027 (Sml Snacks)"
If the start of the string is not a digit and still wants to get extract, replace ( (\\\\d+)
) with (\\\\w+)
如果字符串的开头不是数字并且仍然想要提取,请将 ( (\\\\d+)
) 替换为(\\\\w+)
updated_df_2020 <- data.frame(Machine = c("60022 (Location; 9TH FLOOR; Snacks)",
"60024 (Location; 9TH FLOOR; Lg Snacks)", "60027 (Location; 9TH FLOOR; Sml Snacks)"),
stringsAsFactors = FALSE)
You could do你可以做
> a <- c("60022 (Location; 9TH FLOOR; Snacks)", "60024 (Location; 9TH FLOOR; Snacks)", "60027 (Location; 9TH FLOOR; Snacks)")
> strs <- strsplit(a, split = " ")
> sapply(strs, function(s) paste(s[1], paste0("(", s[length(s)])))
#
# "60022 (Snacks)" "60024 (Snacks)" "60027 (Snacks)"
#
which is uglier, but i guess a bit easier to understand这是丑陋的,但我想更容易理解
We can extract the number at the begining and everything followed by colon afterwards using sub
:我们可以使用sub
提取开头的数字和后面跟着冒号的所有内容:
sub("(\\d+).*;(.*)", "\\1 (\\2", x)
#[1] "60022 ( Snacks)" "60024 ( Lg Snacks)" "60027 ( Sml Snacks)"
where x is其中 x 是
x <- c("60022 (Location; 9TH FLOOR; Snacks)",
"60024 (Location; 9TH FLOOR; Lg Snacks)",
"60027 (Location; 9TH FLOOR; Sml Snacks)")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.