简体   繁体   English

根据选定的变量/列名称过滤和设置R数据帧

[英]Filtering and subsetting R dataframe based on selected variable/column names

I am trying to subset a large data set with many variables/columns names, say ax1, ax2, ax3, ax4, ax5, ...,ax20, bx1...bx20...zx1...zx20. 我正在尝试对包含许多变量/列名称的大型数据集进行子集设置,例如ax1,ax2,ax3,ax4,ax5,...,ax20,bx1 ... bx20 ... zx1 ... zx20。 For example, suppose the subset data I want to obtain are on variables ax3, ax5, ax11, ax19,..., bx3, bx5, cx11, cx19,...,zx3, zx5, zx11, zx19. 例如,假设我要获取的子集数据位于变量ax3,ax5,ax11,ax19,...,bx3,bx5,cx11,cx19,...,zx3,zx5,zx11,zx19上。

I have tried the following code in R but it is becoming very lengthy and cumbersome. 我已经在R中尝试了以下代码,但是它变得非常冗长和繁琐。

setwd("")
abc<- read.table("abc.txt", header=TRUE)
new.abc<-data.frame(abc$ax3,abc$ax5,abc$ax5,abc$ax11,abc$ax19,  
abc$bx3,abc$bx5,abc$bx5,abc$bx11,abc$bx19)

The code is becoming longer as I need to continue with cx3, cx5, cx11, cx19,...,zx3, xz5, zx11, zx19. 由于我需要继续使用cx3,cx5,cx11,cx19,...,zx3,xz5,zx11,zx19,因此代码变得越来越长。 I am looking for an alternative approach that can avoid this lengthy coding. 我正在寻找一种可以避免这种冗长编码的替代方法。 Your help is greatly appreciated. 非常感谢您的帮助。

You could create columns programmatically. 您可以以编程方式创建列。 If they follow the same structure as mentioned in the question, we can do 如果他们遵循问题中提到的相同结构,我们可以

cols <- c(outer(paste0(letters, "x"), c(3, 5, 11, 19), paste0))
cols
#[1] "ax3"  "bx3"  "cx3"  "dx3"  "ex3"  "fx3"  "gx3"  "hx3"  "ix3"  "jx3"  "kx3"...

and then use it to subset the dataframe 然后用它来子集数据框

new.abc[, cols]

If we also want to preserve column order, we can use gtools::mixedsort 如果我们也想保留列顺序,可以使用gtools::mixedsort

new.abc[, gtools::mixedsort(cols)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM