简体   繁体   中英

String Pattern in R

I have a list of strings as follow: "/home/ricardo/MultiClass/data//F10/1036.txt"

>     library(stringr)   
>     strsplit(cls[1], split= "/")

Give me:

#> [[1]] [1] ""           "home"       "ricardo"    "MultiClass" "data"  
#> ""           "F10"        "1036.txt"

How can I keep only the 7th position?

#> "F10"

If you want to extract one or more chars after // up to the first / or end of string use

> library(stringr) 
> s <- "/home/ricardo/MultiClass/data//F10/1036.txt"
> str_extract(s, "(?<=//)[^/]+")
[1] "F10"

The (?<=//)[^/]+ regex pattern will find a position that is preceded with 2 slashes (see (?<=//) ) and then matches one or more characters other than / (see [^/]+ ).

A base R solution with sub will look like

> sub("^.*/([^/]*)/[^/]*$", "\\1", s)
[1] "F10"

Details :

  • ^ - start of string
  • .* - any 0+ chars as many as possible
  • / - a slash (last but one in the string as the previous pattern is greedy)
  • ([^/]*) - capturing group #1 matching any 0+ chars other than /
  • / - last slash
  • [^/]* - any 0+ chars other than /
  • $ - end of string.

It can be done in R-base in this way. I have defined the function gret to extract a pattern from a string

gret <-function(pattern,text,ignore.case=TRUE){
    regmatches(text,regexpr(pattern,text,perl=TRUE,ignore.case))

then

gsub("data|/*","",gret("(?=data/).*(?<=/)","/home/ricardo/MultiClass
/data//F10/1036.txt"))


#>[1] "F10"

Using function word of stringr ,

library(stringr)
word(sub('.*//', '', s), 1, sep = '/')
#[1] "F10"

#where
s <- '/home/ricardo/MultiClass/data//F10/1036.txt'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM