R - Regular Expression - Match the following pattern: WhitespaceHyphenWhitespaceSingledigit

Question

Consider the following data structure (df):

ID	Text
1	Example
2	Example - 1
3	Example - 2
4	Example - 3
5	Example - 4
6	Example - 5
7	Example - NA
8	Text
9	Text - 10
10	Text - 20
11	Text - 30
12	Text - 40
13	Text - 50
14	Text - 60
15	Text - 70
16	Text - 80
17	Text - 90
18	Text - 100

In the column "Text", I want to find all rows that contain the following pattern: WhitespaceHyphenWhitespaceSingledigit

Or in other words, I want to extract the following rows:

ID	Text
2	Example - 1
3	Example - 2
4	Example - 3
5	Example - 4
6	Example - 5

Currently I use the grepl()-function in combination with regular expressions. However none of my attempts like

df[which(grepl("s{1}-\s{1}\d{1}$", df$Text)),]
df[which(grepl("\b\s{1}-\s{1}\d{1}\b$", df$Text)),]

has worked out. Since I am a beginner in programming, I would be grateful for any advices. Thanks in advance.

Answer 1

I would use the following regex pattern:

\s-\s\d(?!\d)

This matches a hyphen in between whitespaces, followed by a single digit which itself is followed by either a non digit character or end of the input.

Full R code:

df[grepl("\\s-\\s\\d(?!\\d)", df$Text, perl=TRUE), ]

R - Regular Expression - Match the following pattern: WhitespaceHyphenWhitespaceSingledigit

Question

1 answers

solution1
0 2022-07-29 08:14:44

R - Regular Expression - Match the following pattern: WhitespaceHyphenWhitespaceSingledigit

Question

1 answers

solution1 0 2022-07-29 08:14:44

solution1
0 2022-07-29 08:14:44