简体   繁体   中英

How to match regular expression exactly in R and pull out pattern

I want to get pattern from my vector of strings

string <- c(
  "P10000101 - Przychody netto ze sprzedazy produktów" ,                    
  "P10000102_PL - Przychody nettozy uslug",                     
  "P1000010201_PL - Handlowych, marketingowych, szkoleniowych",             
  "P100001020101 - - Handlowych,, szkoleniowych - refaktury",
  "- Handlowych, marketingowych,P100001020102, - pozostale"
)

As result I want to get exact match of regular expression

result <- c(
  "P10000101",
  "P10000102_PL",
  "P1000010201_PL",
  "P100001020101",
  "P100001020102"
)

I tried with this pattern = "([PLA]\\\\d+)" and different combinations of value = T, fixed = T, perl = T.

grep(x = string, pattern = "([PLA]\\d+(_PL)?)", fixed = T)

We can try with str_extract

library(stringr)
str_extract(string, "P\\d+(_[A-Z]+)*")
#[1] "P10000101"      "P10000102_PL"   "P1000010201_PL" "P100001020101"  "P100001020102" 

grep is for finding whether the match pattern is present in a particular string or not. For extraction, either use sub or gregexpr/regmatches or str_extract

Using the base R ( regexpr/regmatches )

regmatches(string, regexpr("P\\d+(_[A-Z]+)*", string))
#[1] "P10000101"      "P10000102_PL"   "P1000010201_PL" "P100001020101"  "P100001020102" 

Basically, the pattern to match is P followed by one more numbers ( \\\\d+ ) followed by greedy ( * ) match of _ and one or more upper case letters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM