简体   繁体   English

如何显示基于计算的索引R的数据帧行

[英]How can i show dataframe rows based on computed indices R

I am working on a problem related a dataframe and retrieving specific rows based on indices from matched criteria 我正在研究与数据框有关的问题,并根据匹配条件中的索引检索特定行

# Create dataframe

position <- c("START" , "MIDDLE", "END" ,"START" , "MIDDLE", 
          "MIDDLE", "MIDDLE", "MIDDLE" ,"MIDDLE" ,"MIDDLE",
          "MIDDLE", "MIDDLE", "MIDDLE" ,"END",    "START" , 
          "START" , "START" , "MIDDLE", "MIDDLE", "END", 
          "START" , "START",  "MIDDLE", "MIDDLE", "MIDDLE",
          "END" ,"START",  "MIDDLE", "MIDDLE", "MIDDLE",
          "END", "START" , "MIDDLE", "MIDDLE", "MIDDLE",
          "MIDDLE" ,"MIDDLE" ,"MIDDLE", "MIDDLE" ,"MIDDLE" ,
          "MIDDLE" ,"MIDDLE", "MIDDLE", "MIDDLE", "MIDDLE", 
          "MIDDLE" ,"MIDDLE", "MIDDLE" ,"MIDDLE" ,"MIDDLE" ,
          "MIDDLE", "MIDDLE", "MIDDLE", "END")

text <-c("First line", "Middle Line",  "Last Line", "First line","Middle Line",
     "Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
     "Middle Line", "Middle Line", "Middle Line", "Last Line", "First line",
     "First line", "First line", "Middle Line", "Middle Line", "Last Line",
     "First line", "First line",  "Middle Line", "Middle Line", "Middle Line",
     "Last Line", "First line",  "Middle Line", "Middle Line", "Middle Line",
     "Last Line", "First line",  "Middle Line", "Middle Line", "Middle Line",
     "Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
     "Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
     "Middle Line", "Middle Line", "Middle Line", "Middle Line", "Middle Line",
     "Middle Line", "Middle Line", "Middle Line", "Last Line")

Which essential shows lines like the following: 哪些要点显示如下行:

> head(a_df)
position        text
1    START  First line
2   MIDDLE Middle Line
3      END   Last Line

Basically I want to be able to show subsets of the overall dataframe each subset should contain a start/middle and end line. 基本上,我希望能够显示整个数据框的子集,每个子​​集应包含开始/中间和结束行。

Doing some reading online I am trying to generate indices as follows: 在网上做一些阅读,我试图生成如下索引:

# Generate indices
index_start <- with(a_df, grep("START", a_df$position))
index_end <- with(a_df, grep("END", a_df$position)) 

Which gives required output: 给出所需的输出:

 index_start
[1]  1  4 15 16 17 21 22 27 32
> index_end
[1]  3 14 20 26 31 54

I realise the indices are imbalanced (I am remove these imbalances) but I am wondering how i can use the above output to seed the values in the following subset commands: 我意识到索引是不平衡的(我消除了这些不平衡),但我想知道如何使用上面的输出将值植入以下子集命令中:

a_df[c(1:3),]
a_df[c(4:14),]
a_df[c(17:20),]
a_df[c(22:26),]
a_df[c(27:31),]
a_df[c(32:54),]

Thanks in advance Jonathan 预先感谢乔纳森

It is not clear about selecting the elements of 'index_start' in the sequence, but based on the code showed in the OP's post, it seems like we need to get the last element of 'index_start' that is less than element in 'index_end'. 目前尚不清楚在序列中选择'index_start'的元素,但是根据OP帖子中显示的代码,似乎我们需要获取'index_start'的最后一个元素,该元素小于'index_end'中的元素。 In order to get the last element, we create a grouping variable with findInterval and using tapply , get the last element of 'index_start', with tail 为了获取最后一个元素,我们使用findInterval创建一个分组变量,并使用tapply ,获取“ index_start”的最后一个元素,并使用tail

Then, we get the sequence between corresponding elements of 'index_start1', 'index_end' and subset the dataset rows based on it with Map to get a list of data.frame s. 然后,我们获得“ index_start1”,“ index_end”的相应元素之间的序列,并使用Map对其基于数据集的行进行子集化,以获取data.framelist

index_start1 <- unname(tapply(index_start, findInterval(index_start, index_end),
                           FUN = tail, 1))    
index_start1
#[1]  1  4 17 22 27 32

lst <- Map(function(x, y) a_df[x:y,], index_start1, index_end)
lst
#[[1]]
#  position        text
#1    START  First line
#2   MIDDLE Middle Line
#3      END   Last Line

#[[2]]
#   position        text
#4     START  First line
#5    MIDDLE Middle Line
#6    MIDDLE Middle Line
#7    MIDDLE Middle Line
#8    MIDDLE Middle Line
#9    MIDDLE Middle Line
#10   MIDDLE Middle Line
#11   MIDDLE Middle Line
#12   MIDDLE Middle Line
#13   MIDDLE Middle Line
#14      END   Last Line

#[[3]]
#   position        text
#17    START  First line
#18   MIDDLE Middle Line
#19   MIDDLE Middle Line
#20      END   Last Line

#[[4]]
#   position        text
#22    START  First line
#23   MIDDLE Middle Line
#24   MIDDLE Middle Line
#25   MIDDLE Middle Line
#26      END   Last Line

#[[5]]
#   position        text
#27    START  First line
#28   MIDDLE Middle Line
#29   MIDDLE Middle Line
#30   MIDDLE Middle Line
#31      END   Last Line

#[[6]]
#   position        text
#32    START  First line
#33   MIDDLE Middle Line
#34   MIDDLE Middle Line
#35   MIDDLE Middle Line
#36   MIDDLE Middle Line
#37   MIDDLE Middle Line
#38   MIDDLE Middle Line
#39   MIDDLE Middle Line
#40   MIDDLE Middle Line
#41   MIDDLE Middle Line
#42   MIDDLE Middle Line
#43   MIDDLE Middle Line
#44   MIDDLE Middle Line
#45   MIDDLE Middle Line
#46   MIDDLE Middle Line
#47   MIDDLE Middle Line
#48   MIDDLE Middle Line
#49   MIDDLE Middle Line
#50   MIDDLE Middle Line
#51   MIDDLE Middle Line
#52   MIDDLE Middle Line
#53   MIDDLE Middle Line
#54      END   Last Line

NOTE: It is better to keep the 'data.frame's in the list as most of the operations can be done within the list environment. 注意:最好将“ data.frame”保留在list因为大多数操作都可以在list环境中完成。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM