简体   繁体   中英

use gsub in R to cut the character out between two slashes

I have a file name captured by R like the following:

"0097_abcdef/0097_0/0097_0_04_bed.dbf"

I need to pick up the term between the two slashes / (ie 0097_0 ), but I have tried gsub(".*/","",dbf.files[1]) , but it gives me "0097_0_04_bed.dbf" , which is not quite what I want.

Can anyone help? Thanks.

you can try using -

 .*/(.*)/.* 

and use the first group eg \\1

> x = "0097_abcdef/0097_0/0097_0_04_bed.dbf"
> sub(".*/(.*)/.*","\\1",x)
[1] "0097_0"

A different approach is to use the file path manipulation functions. I my opinion, it is a bit clearer than a regexpr - and it handles Windows paths correctly as well:

# On a Linux path
x <- "0097_abcdef/0097_0/0097_0_04_bed.dbf"
basename( dirname(x) )
# [1] "0097_0"

# On a Windows path
y <- "c:\\0097_abcdef\\0097_0\\0097_0_04_bed.dbf"
basename( dirname(y) )
# [1] "0097_0"

..They are vectorized so you can give them a vector of paths. For completeness, there is also file.path to stitch the parts together again.

You can easily use strsplit instead. For example,

R> x = "0097_abcdef/0097_0/0097_0_04_bed.dbf"
R> strsplit(x, "/")
[[1]]
[1] "0097_abcdef"       "0097_0"            "0097_0_04_bed.dbf"

R> strsplit(x, "/")[[1]][2]
[1] "0097_0"

You can use read.table:

tc <- textConnection(dbf.files)
y <- read.table(tc,sep="/",as.is=TRUE)[2]
close(tc)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM