简体   繁体   English

返回r数据帧中特定行的上下行

[英]Returning above and below rows of specific rows in r dataframe

Consider any dataframe 考虑任何数据帧

            col1   col2    col3   col4
row.name11    A     23      x       y
row.name12    A     29      x       y
row.name13    B     17      x       y
row.name14    A     77      x       y

I have a list of rownames which I want to return from this dataframe. 我有一个rownlist列表,我想从这个数据帧返回。 Lets say I have row.name12 and row.name13 in a list. 假设我在列表中有row.name12和row.name13。 I can easily return these rows from dataframe. 我可以轻松地从数据帧返回这些行。 But I also want to return 4 rows above and 4 rows below these rows. 但我也希望在这些行上方返回4行和4行。 It means I want to return from row.name8 to row.name17. 这意味着我想从row.name8返回到row.name17。 I think it is similar to grep -A -B in shell. 我认为它类似于shell中的grep -A -B

Probable solution- Is there any way to return row number by row name? 可能的解决方案 - 有没有办法按行名返回行号? Because if I have row number than I can easily subtract 4 and add 4 in row number and return rows. 因为如果我有行号,我可以轻松地减去4并在行号中添加4并返回行。

Note: Here rownames are just examples. 注意:这里的rownames只是示例。 Rownames could be anything like RED, BLUE, BLACK, etc. Rownames可以是RED,BLUE,BLACK等。

Try that: 试试看:

extract.with.context <- function(x, rows, after = 0, before = 0) {

  match.idx  <- which(rownames(x) %in% rows)
  span       <- seq(from = -before, to = after)
  extend.idx <- c(outer(match.idx, span, `+`))
  extend.idx <- Filter(function(i) i > 0 & i <= nrow(x), extend.idx)
  extend.idx <- sort(unique(extend.idx))

  return(x[extend.idx, , drop = FALSE])
}

dat <- data.frame(x = 1:26, row.names = letters)
extract.with.context(dat, c("a", "b", "j", "y"), after = 3, before = 1)
#    x
# a  1
# b  2
# c  3
# d  4
# e  5
# i  9
# j 10
# k 11
# l 12
# m 13
# x 24
# y 25
# z 26

Perhaps a combination of which() and %in% would help you: 也许which()%in%可以帮助您:

dat[which(rownames(dat) %in% c("row.name13")) + c(-1, 1), ]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name14    A   77    x    y

In the above, we are trying to identify which row names in "dat" are "row.name13" (using which() ), and the + c(-1, 1) tells R to return the row before and the row after. 在上面,我们试图确定“dat”中的哪些行名称是“row.name13”(使用which() ), + c(-1, 1)告诉R返回之前的行和之后的行。 If you wanted to include the row, you could do something like + c(-1:1) . 如果要包含行,可以执行类似+ c(-1:1)

To get the range of rows, switch the comma to a colon: 要获取行范围,请将逗号切换为冒号:

dat[which(rownames(dat) %in% c("row.name13")) + c(-1:1), ]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# row.name14    A   77    x    y

Update 更新

Matching a list is a little bit trickier, but without thinking about it too much, here is a possibility: 匹配列表有点棘手,但没有考虑太多,这是一种可能性:

myRows <- c("row.name12", "row.name13")
rowRanges <- lapply(which(rownames(dat) %in% myRows), function(x) x + c(-1:1))
# [[1]]
# [1] 1 2 3
# 
# [[2]]
# [1] 2 3 4
#
lapply(rowRanges, function(x) dat[x, ])
# [[1]]
#            col1 col2 col3 col4
# row.name11    A   23    x    y
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# 
# [[2]]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# row.name14    A   77    x    y

This outputs a list of data.frame s which might be handy since you might have duplicated rows (as there are in this example). 这会输出一个data.framelist ,这可能很方便,因为您可能有重复的行(如本示例中所示)。

Update 2: Using grep if it is more appropriate 更新2:如果更合适,使用grep

Here is a variation of your question, one which would be less convenient to solve using the which() ... %in% approach. 以下是您的问题的变体,使用which() ... %in%方法解决问题的方法不太方便。

set.seed(1)
dat1 <- data.frame(ID = 1:25, V1 = sample(100, 25, replace = TRUE))
rownames(dat1) <- paste("rowname", sample(apply(combn(LETTERS[1:4], 2), 
                                               2, paste, collapse = ""), 
                                         25, replace = TRUE), 
                       sprintf("%02d", 1:25), sep = ".")
head(dat1)
#               ID V1
# rowname.AD.01  1 27
# rowname.AB.02  2 38
# rowname.AD.03  3 58
# rowname.CD.04  4 91
# rowname.AD.05  5 21
# rowname.AD.06  6 90

Now, imagine you wanted to identify the rows with AB and AC , but you don't have a list of the numeric suffixes. 现在,假设您想要使用ABAC标识行,但是您没有数字后缀列表。

Here's a little function that can be used in such a scenario. 这是一个可以在这种情况下使用的小功能。 It borrows a little from @Spacedman to make sure that the rows returned are within the range of the data (as per @flodel's suggestion). 它从@Spacedman借了一点,以确保返回的行在数据范围内(根据@ flodel的建议)。

getMyRows <- function(data, matches, range) {
  rowMatches = lapply(unlist(lapply(matches, function(x)
    grep(x, rownames(data)))), function(y) y + range)
  rowMatches = lapply(rowMatches, function(x) x[x > 0 & x <= nrow(data)])
  lapply(rowMatches, function(x) data[x, ])
}

You can use it as follows (but I won't print the results here). 您可以按如下方式使用它(但我不会在此处打印结果)。 First, specify the dataset, then the pattern(s) you want matched, then the range (in this example, three rows before and four rows after). 首先,指定数据集,然后指定要匹配的模式,然后指定范围(在此示例中,前三行,后四行)。

getMyRows(dat1, c("AB", "AC"), -3:4)

Applying it to the earlier example of matching row.name12 and row.name13 , you can use it as follows: getMyRows(dat, c(12, 13), -1:1) . 将它应用于匹配row.name12row.name13的早期示例,您可以按如下方式使用它: getMyRows(dat, c(12, 13), -1:1)

You can also modify the function to make it more general (for example, to specify matching with a column instead of row names). 您还可以修改该函数以使其更通用(例如,指定与列匹配而不是与行名称匹配)。

Create some sample data: 创建一些示例数据:

> dat=data.frame(col1=letters,col2=sample(26),col3=sample(letters))
> dat
   col1 col2 col3
1     a   26    x
2     b   12    i
3     c   15    v
...

Set our target vector (note I choose an edge case and overlapping cases), and find matching rows: 设置我们的目标向量(注意我选择边缘情况和重叠的情况),并找到匹配的行:

> target=c("a","e","g","s")
> match = which(dat$col1 %in% target)

Create sequences from -2 to +2 of the matches (adjust for your needs) and merge: 创建匹配的-2到+2的序列(根据您的需要进行调整)并合并:

> getThese = unique(as.vector(mapply(seq,match-2,match+2)))
> getThese
 [1] -1  0  1  2  3  4  5  6  7  8  9 17 18 19 20 21

Fix the edge cases: 修复边缘情况:

> getThese = getThese[getThese > 0 & getThese <= nrow(dat)]
> dat[getThese,]
   col1 col2 col3
1     a   26    x
2     b   12    i
3     c   15    v
4     d   22    d
5     e    2    j
6     f    9    l
7     g    1    w
8     h   21    n
9     i   17    p
17    q   18    a
18    r   10    m
19    s   24    o
20    t   13    e
21    u    3    k
> 

Remember our targets were a, e, g and s. 记住我们的目标是a,e,g和s。 You've now got those plus two rows above and two rows below for each, with no duplicates. 你现在已经获得了上面两行以及下面两行,没有重复。

If you are using row names, just create 'match' from those. 如果您使用的是行名,只需从中创建“匹配”即可。 I was using a column. 我正在使用一个专栏。

I'd write a bunch more tests using the testthat package if this were my problem. 如果这是我的问题,我会使用testthat包编写更多测试。

I would simply proceed as follow: 我只想按照以下步骤进行:

dat[(grep("row.name12",row.names(dat))-4):(grep("row.name13",row.names(dat))+4),]

grep("row.name12",row.names(dat)) gives you the row number that have "row.name12" as name, so grep("row.name12",row.names(dat))为您提供名称为"row.name12"的行号,因此

(grep("row.name12",row.names(dat))-4):(grep("row.name13",row.names(dat))+4)

gives you a serie of row numbers ranging from the 4th row preceding the row named "row.name12" to the 4th row after the one named "row.name13" . 让你的行号,从指定的行之前的第4行的意甲"row.name12"的第4行命名一前一后"row.name13"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM