提取数字之间的单词

Question

Trying to write some regex in R to extract some words between numbers for each string in a character vector in R. Unfortunately, my regex skills aren't nearly up to the challenge. 试图在R中编写一些regex以为R中的字符向量中的每个字符串提取数字之间的一些单词。不幸的是，我的regex技能几乎无法应对挑战。
Here's an example of the problem and my initial attempt: 这是问题的示例，也是我的最初尝试：

x <- c("1 Singleword 1,234 342", "2 randword & thirdword 1,545 323", 
      "3 Anotherword wordagain Newword. 3,234 556")

m <- regexpr("[a-zA-Z]+\\s+", x, perl = TRUE)

regmatches(x, m)

This approach only produces 这种方法只会产生

"Singleword ", "randword ", "Anotherword "

What I need is 我需要的是

"Singleword", "randword & thirdword", "Anotherword wordagain Neword."

I believe it will need to be some kind of regex pattern that will start with a character (like I currently have) and then pull everything until a number is reached. 我相信这将需要某种regex模式，该模式将从字符开始（例如我目前所拥有的字符），然后拉所有内容直到达到数字。

Answer 1

x <- c("1 Singleword 1,234 342", "2 randword & thirdword 1,545 323", 
       "3 Anotherword wordagain Newword. 3,234 556")

m <- regexpr("[a-zA-Z].(\\D)+", x, perl = TRUE)

regmatches(x, m)

[1] "Singleword " "randword & thirdword " [1]“单字”，“ randword和thirdword”
[3] "Anotherword wordagain Newword. " [3]“再次使用“另一个词”。

I used https://regexr.com/ and it's cheatsheet to figure out how to compose the regex. 我使用了https://regexr.com/ ，它是一个速查单，以找出如何组成正则表达式。

Answer 2

Using sub 使用sub

> sub(".\\s(\\D+).*", "\\1", x)
[1] "Singleword "   "randword & thirdword "  "Anotherword wordagain Newword. "

Using str_extract 使用str_extract

> library(stringr)
> str_extract(x, pattern = "\\D+")
[1] " Singleword "  " randword & thirdword "  " Anotherword wordagain Newword. "

Answer 3

sample data 样本数据

x <- c("1 Singleword 1,234 342", "2 randword & thirdword 1,545 323", 
   "3 Anotherword wordagain Newword. 3,234 556")

Base R 基数R

#replace als numbers and comma's with `""` (=nothing), 
# also, trim whitespaces (thanks Markus!)
trimws( gsub( "[0-9,]", "", x ) )

[1] "Singleword" "randword & thirdword" "Anotherword wordagain Newword." [1]“单字”，“ randword和Thirdword”，“ Anotherword word Newword”。

stringR 字符串

library(stringr)
str_extract(x, pattern = "(?<=\\d )[^0-9]+(?= \\d)")

[1] "Singleword" "randword & thirdword" "Anotherword wordagain Newword." [1]“单字”，“ randword和Thirdword”，“ Anotherword word Newword”。

If you like to learn more about (the working of) regex-patterns in the code above (and in the other answers), check out their magic (and explanation) at: https://regex101.com/ 如果您想在上面的代码（以及其他答案）中了解更多关于正则表达式模式（及其工作原理）的信息，请访问以下网址查看其魔术（和说明）： https : //regex101.com/

explanation of the last regex-pattern: https://regex101.com/r/QgERuZ/2 最后一个regex模式的说明： https : //regex101.com/r/QgERuZ/2

提取数字之间的单词

问题描述

3 个解决方案

解决方案1
2 2019-01-17 20:42:00

解决方案2
1 已采纳 2019-01-17 20:42:15

解决方案3
1 2019-01-17 20:42:19

提取数字之间的单词

问题描述

3 个解决方案

解决方案1 2 2019-01-17 20:42:00

解决方案2 1 已采纳 2019-01-17 20:42:15

解决方案3 1 2019-01-17 20:42:19

解决方案1
2 2019-01-17 20:42:00

解决方案2
1 已采纳 2019-01-17 20:42:15

解决方案3
1 2019-01-17 20:42:19