简体   繁体   中英

R - Regular Expression - Match the following pattern: WhitespaceHyphenWhitespaceSingledigit

Consider the following data structure (df):

ID Text
1 Example
2 Example - 1
3 Example - 2
4 Example - 3
5 Example - 4
6 Example - 5
7 Example - NA
8 Text
9 Text - 10
10 Text - 20
11 Text - 30
12 Text - 40
13 Text - 50
14 Text - 60
15 Text - 70
16 Text - 80
17 Text - 90
18 Text - 100

In the column "Text", I want to find all rows that contain the following pattern: WhitespaceHyphenWhitespaceSingledigit

Or in other words, I want to extract the following rows:

ID Text
2 Example - 1
3 Example - 2
4 Example - 3
5 Example - 4
6 Example - 5

Currently I use the grepl()-function in combination with regular expressions. However none of my attempts like

  • df[which(grepl("s{1}-\s{1}\d{1}$", df$Text)),]
  • df[which(grepl("\b\s{1}-\s{1}\d{1}\b$", df$Text)),]

has worked out. Since I am a beginner in programming, I would be grateful for any advices. Thanks in advance.

I would use the following regex pattern:

\s-\s\d(?!\d)

This matches a hyphen in between whitespaces, followed by a single digit which itself is followed by either a non digit character or end of the input.

Full R code:

df[grepl("\\s-\\s\\d(?!\\d)", df$Text, perl=TRUE), ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM