简体   繁体   English

使用正则表达式从文件中提取文本的一部分

[英]using regex to extract a portion of text from a file

I am trying to use the following code: 我正在尝试使用以下代码:

x <- scan("myfile.txt", what="", sep="\n")

b <- grep('/^one/(.*?)/^four/', x, ignore.case = TRUE, perl = TRUE, value = TRUE,
     fixed = FALSE, useBytes = FALSE, invert = FALSE)

to extract a porting of text from myfile.txt 从myfile.txt中提取文本的移植

zero
one
two
three
four
five

the output I'm expecting is 我期望的输出是

one
two
three
four

I want to include the "one" and "four" I don't want to ditch them :) 我想包括“一”和“四”,但我不想抛弃它们:)

But somehow the regex is not working, The console output is giving no error but no text either... ? 但是正则表达式不起作用,控制台输出没有错误,但是也没有文本...?

I am using print(b) 我正在使用print(b)

I'm not quite clear on what you're looking for, but just for fun... 我不清楚您要寻找什么,只是为了好玩...

R> x
[1] "zero"  "one"   "two"   "three" "four"  "five" 

R> grep("one|four", x) # get the position of "one" and "four"
[1] 2 5

Subset x to only include the things between "one" and "four" x子集仅包含“一”和“四”之间的事物

R> x[do.call(seq, as.list(grep("one|four", x)))]
[1] "one"   "two"   "three" "four" 
gsub('one(.*)four','\\1',paste(x,collapse=''))
[1] "zerotwothreefive"

or to get space between words : 或在单词之间留出空间:

gsub('one(.*)four','\\1',paste(dat,collapse=' '))
[1] "zero  two three  five"

Edit after Gsee comment: 在Gsee评论后编辑

 gsub('.*(one.*four).*','\\1',paste(dat,collapse=' '))
[1] "one two three four"

But I think here no need to use regular expression : 但是我认为这里不需要使用正则表达式:

 dat[seq(which(dat == 'one'),which(dat == 'four'))]
[1] "one"   "two"   "three" "four" 

of course you can use min if the previous index in which are not in the good order. 当然,如果先前的索引顺序不正确,则可以使用min。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM