从字符向量中提取特定元素

Question

I have a character vector 我有一个角色矢量

a=c("Mom", "mother", "Alex", "Betty", "Prime Minister")

I want to extract words starting with "M" only (upper and lower both) 我只想提取以“ M”开头的单词（上下两个都）

How to do this? 这个怎么做？

I have tried using grep() , sub() and other variants of this function but I am not getting it right. 我试过使用grep() ， sub()和此函数的其他变体，但我做对了。

I expect the output to be a character vector of "Mom" and "mother" 我希望输出是“妈妈”和“母亲”的字符向量

Answer 1

a[startsWith(toupper(a), "M")]

Answer 2

plain grep will also do just fine 普通grep也可以

grep( "^m", a, ignore.case = TRUE, value = TRUE )
#[1] "Mom"    "mother"

benchmarks 基准
tom's answer (startsWith) is the winner, but there is some room for improvement (check startsWith2 's code) 汤姆的答案（startsWith）是赢家，但仍有一些改进的余地（请查看startsWith2的代码）

microbenchmark::microbenchmark(
  substr = a[substr(a, 1, 1) %in% c("M", "m")],
  grepl = a[grepl("^[Mm]", a)],
  grep = grep( "^m", a, ignore.case = TRUE, value = TRUE ),
  stringr = unlist(stringr::str_extract_all(a,regex("^M.*",ignore_case = T))),
  startsWith1 = a[startsWith(toupper(a), "M")],
  startsWith2= a[startsWith(a, c("M", "m"))]
)


# Unit: nanoseconds
#        expr   min      lq     mean median    uq    max neval
#      substr  1808  2411.0  3323.19   3314  3917   8435   100
#       grepl  3916  4218.0  5438.06   4820  6930   8436   100
#        grep  3615  4368.5  5450.10   4820  6929  19582   100
#     stringr 50913 53023.0 55764.10  54529 55132 174432   100
# startsWith1  1506  2109.0  2814.11   2711  3013  17474   100
# startsWith2   602  1205.0  1410.17   1206  1507   3013   100

Answer 3

Use grepl , with the pattern ^[Mm] : 使用grepl ，其模式为^[Mm] ：

a[grepl("^[Mm]", a)]

[1] "Mom"    "mother"

Here is what the pattern ^[Mm] means: 这是模式^[Mm]含义：

^      from the start of the string
[Mm]   match either a lowercase or uppercase letter M

The grepl function works by just asserting that the input pattern matches at least once, so we don't need to be concerned with the rest of the string. grepl函数的工作原理是断言输入模式至少匹配一次，因此我们不必关心字符串的其余部分。

Answer 4

Using stringr 使用stringr

 library(stringr)
   unlist(str_extract_all(a,regex("^M.*",ignore_case = T)))



[1] "Mom"    "mother"

Answer 5

substr is a very tractable base R function: substr是一个非常易于处理的基本R函数：

a[substr(a, 1, 1) %in% c("M", "m")]

# [1] "Mom"    "mother"

And since you mentioned sub() then you could do (not necessarily recommended though): 而且由于您提到了sub()所以您可以这样做（不过不一定建议这样做）：

a[sub("(.).*", "\\1", a) %in% c("M", "m")]

从字符向量中提取特定元素

问题描述

5 个解决方案

解决方案1
2 2019-01-16 06:14:56

解决方案2
2 2019-01-16 06:28:21

解决方案3
1 2019-01-16 06:19:38

解决方案4
1 2019-01-16 06:20:22

解决方案5
0 2019-01-16 07:45:16

从字符向量中提取特定元素

问题描述

5 个解决方案

解决方案1 2 2019-01-16 06:14:56

解决方案2 2 2019-01-16 06:28:21

解决方案3 1 2019-01-16 06:19:38

解决方案4 1 2019-01-16 06:20:22

解决方案5 0 2019-01-16 07:45:16

解决方案1
2 2019-01-16 06:14:56

解决方案2
2 2019-01-16 06:28:21

解决方案3
1 2019-01-16 06:19:38

解决方案4
1 2019-01-16 06:20:22

解决方案5
0 2019-01-16 07:45:16