简体   繁体   中英

GREP in R, Extract matching lines from a list

I am trying to extract few lines from a list of 300 lines which is prepared from a set of PDF files in a directory.

All the pdf files are in a list of 300 lines. Now I want to extract lines that has a matching word.

library(stringr)
library(pdftools)
library(tm)
library(tidyverse)
library(rex)

#Directory with multiple pdf files
files<- list.files(pattern='pdf$')

#Extract all files content into a list
lapply(files, function(x) strsplit(pdf_text(x), "\n")[[1]]) -> result

#change the type for ease of processing
mylist <- unlist(result)  %>% str_split("\n")

#Squish all the words in a line together with space default
str_squish(mylist)

#Find lines that has a match with the mentioned string (ex: Table in t)
t  <- grep("Table",  mylist)
t1 <- grep("T[0-9]", mylist)
f  <- grep("Figure", mylist)
f1 <- grep("F[0-9]", mylist)
l  <- grep("Listing",mylist[1:300])
l1 <- grep("L[0-9]", mylist)
s  <- grep("Source", mylist)

# Output of t with indices where there is a match for string "Table"
> t
[1]  46  71  95 124 153 250 278

#Now how to print these indices values to a new list? or Do i go back to mylist and pass the indices numbers and extract it from mylist. What is the best way to do it ?
----------------------------

when I run these lines of code (t,t1,f,f1,l,l1,s) I get the indices of the matching string in that line.

below is the image with output showing lines where it has a match.

Now I just need to print those lines to another list. How do I do that, Please advise.

Without test data it's difficult to say, the code below is untested.

Put the patterns in a list and lapply/grep with value = TRUE . This returns a list with each member a vector of the matching strings.

search_list <- list("Table", "T[0-9]", "Figure", "F[0-9]", "Listing", "L[0-9]", "Source")
matches_list <- lapply(search_list, grep, x = mylist, value = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM