I am working on a problem related a dataframe and retrieving specific rows based on indices from matched criteria
# Create dataframe
position <- c("START" , "MIDDLE", "END" ,"START" , "MIDDLE",
"MIDDLE", "MIDDLE", "MIDDLE" ,"MIDDLE" ,"MIDDLE",
"MIDDLE", "MIDDLE", "MIDDLE" ,"END", "START" ,
"START" , "START" , "MIDDLE", "MIDDLE", "END",
"START" , "START", "MIDDLE", "MIDDLE", "MIDDLE",
"END" ,"START", "MIDDLE", "MIDDLE", "MIDDLE",
"END", "START" , "MIDDLE", "MIDDLE", "MIDDLE",
"MIDDLE" ,"MIDDLE" ,"MIDDLE", "MIDDLE" ,"MIDDLE" ,
"MIDDLE" ,"MIDDLE", "MIDDLE", "MIDDLE", "MIDDLE",
"MIDDLE" ,"MIDDLE", "MIDDLE" ,"MIDDLE" ,"MIDDLE" ,
"MIDDLE", "MIDDLE", "MIDDLE", "END")
text <-c("First line", "Middle Line", "Last Line", "First line","Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Last Line", "First line",
"First line", "First line", "Middle Line", "Middle Line", "Last Line",
"First line", "First line", "Middle Line", "Middle Line", "Middle Line",
"Last Line", "First line", "Middle Line", "Middle Line", "Middle Line",
"Last Line", "First line", "Middle Line", "Middle Line", "Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
"Middle Line", "Middle Line", "Middle Line", "Last Line")
Which essential shows lines like the following:
> head(a_df)
position text
1 START First line
2 MIDDLE Middle Line
3 END Last Line
Basically I want to be able to show subsets of the overall dataframe each subset should contain a start/middle and end line.
Doing some reading online I am trying to generate indices as follows:
# Generate indices
index_start <- with(a_df, grep("START", a_df$position))
index_end <- with(a_df, grep("END", a_df$position))
Which gives required output:
index_start
[1] 1 4 15 16 17 21 22 27 32
> index_end
[1] 3 14 20 26 31 54
I realise the indices are imbalanced (I am remove these imbalances) but I am wondering how i can use the above output to seed the values in the following subset commands:
a_df[c(1:3),]
a_df[c(4:14),]
a_df[c(17:20),]
a_df[c(22:26),]
a_df[c(27:31),]
a_df[c(32:54),]
Thanks in advance Jonathan
It is not clear about selecting the elements of 'index_start' in the sequence, but based on the code showed in the OP's post, it seems like we need to get the last element of 'index_start' that is less than element in 'index_end'. In order to get the last element, we create a grouping variable with findInterval
and using tapply
, get the last element of 'index_start', with tail
Then, we get the sequence between corresponding elements of 'index_start1', 'index_end' and subset the dataset rows based on it with Map
to get a list
of data.frame
s.
index_start1 <- unname(tapply(index_start, findInterval(index_start, index_end),
FUN = tail, 1))
index_start1
#[1] 1 4 17 22 27 32
lst <- Map(function(x, y) a_df[x:y,], index_start1, index_end)
lst
#[[1]]
# position text
#1 START First line
#2 MIDDLE Middle Line
#3 END Last Line
#[[2]]
# position text
#4 START First line
#5 MIDDLE Middle Line
#6 MIDDLE Middle Line
#7 MIDDLE Middle Line
#8 MIDDLE Middle Line
#9 MIDDLE Middle Line
#10 MIDDLE Middle Line
#11 MIDDLE Middle Line
#12 MIDDLE Middle Line
#13 MIDDLE Middle Line
#14 END Last Line
#[[3]]
# position text
#17 START First line
#18 MIDDLE Middle Line
#19 MIDDLE Middle Line
#20 END Last Line
#[[4]]
# position text
#22 START First line
#23 MIDDLE Middle Line
#24 MIDDLE Middle Line
#25 MIDDLE Middle Line
#26 END Last Line
#[[5]]
# position text
#27 START First line
#28 MIDDLE Middle Line
#29 MIDDLE Middle Line
#30 MIDDLE Middle Line
#31 END Last Line
#[[6]]
# position text
#32 START First line
#33 MIDDLE Middle Line
#34 MIDDLE Middle Line
#35 MIDDLE Middle Line
#36 MIDDLE Middle Line
#37 MIDDLE Middle Line
#38 MIDDLE Middle Line
#39 MIDDLE Middle Line
#40 MIDDLE Middle Line
#41 MIDDLE Middle Line
#42 MIDDLE Middle Line
#43 MIDDLE Middle Line
#44 MIDDLE Middle Line
#45 MIDDLE Middle Line
#46 MIDDLE Middle Line
#47 MIDDLE Middle Line
#48 MIDDLE Middle Line
#49 MIDDLE Middle Line
#50 MIDDLE Middle Line
#51 MIDDLE Middle Line
#52 MIDDLE Middle Line
#53 MIDDLE Middle Line
#54 END Last Line
NOTE: It is better to keep the 'data.frame's in the list
as most of the operations can be done within the list
environment.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.