简体   繁体   中英

Using grep, grepl and regexpr within loops in R

I want to automate the extraction of certain information from text files using grep, grepl and regexpr. I have a code that works when I do it for each individual file, however I cannot get the loop to work, to automate the process for all files in my working directory.

I am reading in the txt files as strings because of the structure of the data. The loop seems to iterate through the first file numerous times corresponding to the number of files in the directory, obviously because of the length(txtfiles) command in the for statement.

txtfiles = list.files(pattern="*.txt")

for (i in 1:length(txtfiles)){
all_data <- readLines(txtfiles[i])

#select hours of operation 
hours_op[i] <- all_data[hours_of_operation <- grep("Annual Hours of Operation:",all_data)]
hours_op[i] <-regmatches(hours_op, regexpr("[0-9]{1,9}.[0-9]{1,9}",hours_op))

}

I would be grateful if someone could point me in the right direction to repeat this routine for each file, rather than the same file multiple times over. I want to end up with a list of the file names and the corresponding hours_op .

you need to either add an index ( [i] ) to every one of your reference to hours_op[i] , as in:

for (i in 1:length(txtfiles)){
    all_data <- readLines(txtfiles[i])
    hours_op[i] <- all_data[hours_of_operation <- grep("Annual Hours of Operation:",all_data)]
    hours_op[i] <-regmatches(hours_op[i], regexpr("[0-9]{1,9}.[0-9]{1,9}",hours_op[i]))
}

or better yet, use a temporary variable:

for (i in 1:length(txtfiles)){
    all_data <- readLines(txtfiles[i])
    temp <- all_data[hours_of_operation <- grep("Annual Hours of Operation:",all_data)]
    hours_op[i] <-regmatches(temp, regexpr("[0-9]{1,9}.[0-9]{1,9}",temp))
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM