简体   繁体   English

如何使用 R 中的其他两个数据帧对数据进行子集化

[英]how to subset data using two other dataframes in R

I have a dataframe with ID's (100) and each ID has different number of rows and all ID's have same number of columns.我有一个带有 ID (100) 的数据框,每个 ID 的行数不同,所有 ID 的列数都相同。

the sample dataframe looks like as follows示例数据框如下所示

a <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2), 
              A = c(12,12.5,15,16,18,20,25,26,29,35, 12,12.5,15,16,18,20,25,26,29,35),
              B = c(20,19,18,17,16,20,25,28,30,35, 20,19,18,17,16,20,25,28,30,35),
              C = c(2,1,5,9,10,11,13,18,25,27,2,1,5,9,10,11,13,18,25,27))

in each ID I want to subset the data between two specified values, these two values are in two other dataframes respectively.在每个 ID 中,我想对两个指定值之间的数据进行子集化,这两个值分别位于另外两个数据帧中。

the first value is the first row of each ID and the sample dataframe is as follows第一个值是每个 ID 的第一行,示例数据帧如下

 b <- data.frame(ID = c(1,2), 
              A = c(12.0,12.0),
              B = c(20,20),
              C = c(2,2))

the second value is the specified row based on some value in a vector and the sample dataframe is as follows第二个值是基于向量中某个值的指定行,示例数据帧如下

c <- data.frame(ID = c(1,2), 
               A = c(25.0,20.0),
               B = c(25,20),
               C = c(13,11))

if we can observe, the rows corresponding to each ID, the values are same as in the main dataframe 'a'如果我们可以观察到,与每个 ID 对应的行,其值与主数据帧 'a' 中的值相同

the expected dataframe is as follows预期的数据框如下

d <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,2,2,2), 
              A = c(12,12.5,15,16,18,20,25, 12,12.5,15,16,18,20),
              B = c(20,19,18,17,16,20,25, 20,19,18,17,16,20),
              C = c(2,1,5,9,10,11,13,2,1,5,9,10,11))

to get the expected output, I have tried the following code.....but failed为了获得预期的输出,我尝试了以下代码.....但失败了

for (i in 1:nrow(b)){
Azimuth[i] = (a[which(a$A == b$A[i]):which(a$A == c$A[i])])
}

here, I am trying to use two dataframes 'b' and 'c' to subset the data from 'a'.在这里,我尝试使用两个数据框“b”和“c”来从“a”中提取数据子集。 but is it possible to get the same output without using dataframe 'b'?!但是是否有可能在不使用数据帧“b”的情况下获得相同的输出?! because in dataframe 'b', each row is the first row in each ID from dataframe 'a'因为在数据帧 'b' 中,每一行都是数据帧 'a' 中每个 ID 中的第一行

A dplyr solution dplyr解决方案

library(dplyr)
a %>% 
  mutate(end = FALSE) %>% 
  rows_update(c %>% mutate(end = TRUE), by = c("ID", "A", "B", "C")) %>% 
  group_by(ID) %>% 
  slice(1:which(end)) %>% 
  select(-end)

Output:输出:

# A tibble: 13 x 4
# Groups:   ID [2]
      ID     A     B     C
   <dbl> <dbl> <dbl> <dbl>
 1     1  12      20     2
 2     1  12.5    19     1
 3     1  15      18     5
 4     1  16      17     9
 5     1  18      16    10
 6     1  20      20    11
 7     1  25      25    13
 8     2  12      20     2
 9     2  12.5    19     1
10     2  15      18     5
11     2  16      17     9
12     2  18      16    10
13     2  20      20    11

Explanation:解释:

I guess you want to use a dataframe to subset another one because you want to subset a only if there exist certain combinations of ID , A , B and C , which you specify in your dataframe c ?我猜您想使用一个数据帧来对另一个数据帧进行子集化,因为您只想在存在IDABC某些组合时才对a进行子集化,这些组合是您在数据帧c指定的?

If that is the case, your goal can be achieved by taking the following steps:如果是这种情况,可以通过以下步骤实现您的目标:

  1. We create another logical variable in a .我们在a创建另一个逻辑变量。 Call it end and default to FALSE .将其称为end并默认为FALSE
  2. We also create the same variable in c but set its default to TRUE .我们还在c创建了相同的变量,但将其默认设置为TRUE
  3. We use end in c to update the end in a for each row marked by a combination of ID , A , B and C .我们使用endc更新enda用于通过组合标记每个行IDABC In this way, the variable end will become TRUE only when there is a full match between a and c for the other four variables.这样,只有当其他四个变量的ac完全匹配时,变量end才会变为TRUE If you cannot find a full match, then you will get this Error: Attempting to update missing rows.如果找不到完整匹配项,则会收到此Error: Attempting to update missing rows.
  4. We group_by(ID) and select from the first row until where end is TRUE for each group defined by ID .我们group_by(ID)和,直到其中来自第一行选择endTRUE用于通过定义的每个组ID
  5. You drop that end variable since it has no use any more.您删除该end变量,因为它不再有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM