Is it possible to subset a list of strings eg List[1:3] using grepl? I want to identify the first word in a character string to begin the index and end the index with the first word of the string that matches.
The reason I don't want to use the numeric index is that I plan on subsetting multiple financial statement pdfs and they may differ in terms of what is contained in the list.
Here is the data I have:
list(c("CASH $99,999,999.00 $99,999,999.00 0.00"),
c("CASH SLIPS 1,000,000.00 1,000,000.00 0.00"),
c("BONDS 500,000.00 (500,000.00)"),
c("ACCOUNTS RECEIVABLE 1,000,000.00 2,000,000.00 (1,000,000.00)"))
How would I subset by beginning at CASH ie the exact match, not CASH SLIPS, and end at BONDS?
Desired output:
list(c("CASH $99,999,999.00 $99,999,999.00 0.00"),
c("CASH SLIPS 1,000,000.00 1,000,000.00 0.00"),
c("BONDS 500,000.00 (500,000.00)"))
We can use Filter
from base R
Filter(function(x) grepl("^(CASH|BONDS)", x), lst1)
#[[1]]
#[1] "CASH $99,999,999.00 $99,999,999.00 0.00"
#[[2]]
#[1] "CASH SLIPS 1,000,000.00 1,000,000.00 0.00"
#[[3]]
#[1] "BONDS 500,000.00 (500,000.00)"
Or another option if we want to subset based on the starting index of 'CASH' and ending index of 'BONDS'
i1 <- sub("\\s+[^A-Z]+", "", unlist(lst1)) %in% c("CASH", "BONDS")
lst1[Reduce(`:`, as.list(range(which(i1))))]
#[[1]]
#[1] "CASH $99,999,999.00 $99,999,999.00 0.00"
#[[2]]
#[1] "CASH SLIPS 1,000,000.00 1,000,000.00 0.00"
#[[3]]
#[1] "BONDS 500,000.00 (500,000.00)"
Or using grepl
lst1[Reduce(`:`, as.list(range(grep("^(CASH|BONDS)\\s+([^A-Z])", unlist(lst1)))))]
#[[1]]
#[1] "CASH $99,999,999.00 $99,999,999.00 0.00"
#[[2]]
#[1] "CASH SLIPS 1,000,000.00 1,000,000.00 0.00"
#[[3]]
#[1] "BONDS 500,000.00 (500,000.00)"
or using keep
from purrr
library(stringr)
library(purrr)
keep(lst1, ~ str_detect(.x, '^(CASH|BONDS)'))
Or with sapply
and word
lst1[sapply(lst1, word, 1) %in% c("CASH", "BONDS")]
You can extract the words until a number or "$"
sign is encountered for each list.
first_word <- sapply(data, function(x) sub('(.*?)\\s(\\d+|\\$).*', '\\1', x))
first_word
#[1] "CASH" "CASH SLIPS" "BONDS" "ACCOUNTS RECEIVABLE"
and use these first_word
to select elements from the list which start with "CASH"
and end at "BONDS"
.
data[which(first_word == 'CASH'):which(first_word == 'BONDS')]
#[[1]]
#[1] "CASH $99,999,999.00 $99,999,999.00 0.00"
#[[2]]
#[1] "CASH SLIPS 1,000,000.00 1,000,000.00 0.00"
#[[3]]
#[1] "BONDS 500,000.00 (500,000.00)"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.