简体   繁体   English

R基于多个列值将数据帧子集化为多个数据帧

[英]R subsetting a data frame into multiple data frames based on multiple column values

I am trying to subset a data frame, where I get multiple data frames based on multiple column values. 我正在尝试对数据帧进行子集化,其中我基于多个列值获得多个数据帧。 Here is my example 这是我的例子

>df
  v1   v2   v3   v4   v5
   A    Z    1    10   12
   D    Y    10   12    8
   E    X    2    12   15
   A    Z    1    10   12
   E    X    2    14   16

The expected output is something like this where I am splitting this data frame into multiple data frames based on column v1 and v2 预期的输出是这样的,我将基于列v1v2将该数据帧分成多个数据帧

>df1
 v3   v4   v5
  1   10   12
  1   10   12
>df2
 v3   v4   v5
 10   12    8
>df3
 v3   v4   v5
 2    12   15
 2    14   16

I have written a code which is working right now but don't think that's the best way to do it. 我已经编写了一个现在正在运行的代码,但不认为这是最好的方法。 There must be a better way to do it. 必须有更好的方法来做到这一点。 Assuming tab is the data.frame having the initial data. 假设tab是具有初始数据的data.frame。 Here is my code: 这是我的代码:

v1Factors<-levels(factor(tab$v1))
v2Factors<-levels(factor(tab$v2))

for(i in 1:length(v1Factors)){
  for(j in 1:length(v2Factors)){
    subsetTab<-subset(tab, v1==v1Factors[i] & v2==v2Factors[j], select=c("v3", "v4", "v5"))
    print(subsetTab)
  }
}

Can someone suggest a better method to do the above? 有人可以建议一个更好的方法来做上述事情吗?

You are looking for split 你正在寻找split

split(df, with(df, interaction(v1,v2)), drop = TRUE)
$E.X
  v1 v2 v3 v4 v5
3  E  X  2 12 15
5  E  X  2 14 16

$D.Y
  v1 v2 v3 v4 v5
2  D  Y 10 12  8

$A.Z
  v1 v2 v3 v4 v5
1  A  Z  1 10 12

As noted in the comments 正如评论中所述

any of the following would work 以下任何一种都可行

library(microbenchmark)
microbenchmark(
                split(df, list(df$v1,df$v2), drop = TRUE), 
               split(df, interaction(df$v1,df$v2), drop = TRUE),
               split(df, with(df, interaction(v1,v2)), drop = TRUE))


Unit: microseconds
                                                  expr      min        lq    median       uq      max neval
            split(df, list(df$v1, df$v2), drop = TRUE) 1119.845 1129.3750 1145.8815 1182.119 3910.249   100
     split(df, interaction(df$v1, df$v2), drop = TRUE)  893.749  900.5720  909.8035  936.414 3617.038   100
 split(df, with(df, interaction(v1, v2)), drop = TRUE)  895.150  902.5705  909.8505  927.128 1399.284   100

It appears interaction is slightly faster (probably due the fact that the f = list(...) are just converted to an interaction within the function) 看起来interaction稍微快一点(可能是由于f = list(...)刚刚转换为函数内的交互)


Edit 编辑

If you just want use the subset data.frames then I would suggest using data.table for ease of coding 如果您只想使用子集data.frames,那么我建议使用data.table以便于编码

library(data.table)

dt <- data.table(df)
dt[, plot(v4, v5), by = list(v1, v2)]

There's now also nest() from tidyr which is rather nice. 现在还有来自tidyr nest() ,相当不错。

library(tidyr)
nestdf <- df %>% nest(v3:v5)
nestdf$data

> nestdf$data
[[1]]
# A tibble: 2 × 3
     v3    v4    v5
  <int> <int> <int>
1     1    10    12
2     1    10    12

[[2]]
# A tibble: 1 × 3
     v3    v4    v5
  <int> <int> <int>
1    10    12     8

[[3]]
# A tibble: 2 × 3
     v3    v4    v5
  <int> <int> <int>
1     2    12    15
2     2    14    16

Access individual tibbles with nestdf$data[1] and so on. 使用nestdf$data[1]访问单个元素,依此类推。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于字符向量值对R中的多个数据帧进行子集化 - subsetting across multiple data frames in R based on character vector values 根据列中的值范围在R中子集数据帧 - Subsetting data frame in R based on range of values in a column 基于R中的不同数据帧的值进行子集 - Subsetting based on values of a different data frame in R 根据一列中的唯一值比较多个数据帧,并在R中的多个数据帧中查找第二列中的重叠值 - Comparing multiple data frames based on unique values in one column and finding overlapping values in second column in multiple data frames in R 通过 R 中多个变量中的多个值对数据进行子集化 - Subsetting data by multiple values in multiple variables in R 基于多个数据子集条件的行值创建新列 - Creating new column based on row values of multiple data subsetting conditions R - 带有 dplyr 的循环用于跨多个数据帧的子集数据 - R - For loop with dplyr for subsetting data across multiple data frames r编程为每个值向量和数据帧列多次设置数据帧 - r programming subsetting a data frame multiple times for each value a vector and a data frame column 基于具有多个条件的另一个数据帧对数据帧进行子集 - Subsetting a data frame based on another data frame with multiple conditions 在R中跨多个数据框设置日期和时间 - Subsetting Dates and Times Across Multiple Data Frames in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM