简体   繁体   English

在 R 的循环中使用 grep、grepl 和 regexpr

[英]Using grep, grepl and regexpr within loops in R

I want to automate the extraction of certain information from text files using grep, grepl and regexpr.我想使用 grep、grepl 和 regexpr 从文本文件中自动提取某些信息。 I have a code that works when I do it for each individual file, however I cannot get the loop to work, to automate the process for all files in my working directory.我有一个代码,当我为每个单独的文件执行此操作时,它可以工作,但是我无法使循环正常工作,无法为我的工作目录中的所有文件自动执行此过程。

I am reading in the txt files as strings because of the structure of the data.由于数据的结构,我将 txt 文件作为字符串读取。 The loop seems to iterate through the first file numerous times corresponding to the number of files in the directory, obviously because of the length(txtfiles) command in the for statement.循环似乎根据目录中的文件数多次迭代第一个文件,显然是因为for语句中的length(txtfiles)命令。

txtfiles = list.files(pattern="*.txt")

for (i in 1:length(txtfiles)){
all_data <- readLines(txtfiles[i])

#select hours of operation 
hours_op[i] <- all_data[hours_of_operation <- grep("Annual Hours of Operation:",all_data)]
hours_op[i] <-regmatches(hours_op, regexpr("[0-9]{1,9}.[0-9]{1,9}",hours_op))

}

I would be grateful if someone could point me in the right direction to repeat this routine for each file, rather than the same file multiple times over.如果有人能指出我正确的方向来为每个文件重复这个例程,而不是多次重复同一个文件,我将不胜感激。 I want to end up with a list of the file names and the corresponding hours_op .我想得到一个文件名列表和相应的hours_op

you need to either add an index ( [i] ) to every one of your reference to hours_op[i] , as in:您需要为每个对hours_op[i]引用添加一个索引 ( [i] ),如下所示:

for (i in 1:length(txtfiles)){
    all_data <- readLines(txtfiles[i])
    hours_op[i] <- all_data[hours_of_operation <- grep("Annual Hours of Operation:",all_data)]
    hours_op[i] <-regmatches(hours_op[i], regexpr("[0-9]{1,9}.[0-9]{1,9}",hours_op[i]))
}

or better yet, use a temporary variable:或者更好的是,使用临时变量:

for (i in 1:length(txtfiles)){
    all_data <- readLines(txtfiles[i])
    temp <- all_data[hours_of_operation <- grep("Annual Hours of Operation:",all_data)]
    hours_op[i] <-regmatches(temp, regexpr("[0-9]{1,9}.[0-9]{1,9}",temp))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM