简体   繁体   English

用可变数量的特定列R子集数据帧

[英]Subset a dataframe by a variable number of specific columns R

this one has been bugging me for a couple of days now, and I havent had any luck on stack exchange yet. 这已经困扰了我几天,而我在堆栈交换方面还没有任何运气。 Essentially, I have two tables, one table defines what columns (by column number) to select from the second table. 本质上,我有两个表,一个表定义从第二个表中选择哪些列(按列号)。 My initial plan was to string together the columns and pass that into a subselect statement, however when I define the string as as.character it's not happy, ie: 我最初的计划是将各列连接在一起并将其传递给subselect语句,但是当我将字符串定义为as.character时,它并不满意,即:

# Data Sets, Variable_Selection: Table of Columns to Select from Variable_Table

VARIABLE_SELECTION <- data.frame(Set.1 = c(3,1,1,1,1), Set.2 = c(0,3,2,2,2), Set.3 = c(0,0,3,4,3),
                                 Set.4 = c(0,0,0,5,4), Set.5 = c(0,0,0,0,5))

VARIABLE_TABLE <- data.frame(Var.1 = runif(100,0,10), Var.2 = runif(100,-100,100), Var.3 = runif(100,0,1),
                             Var.4 = runif(100,-1000,1000), Var.5 = runif(100,-1,1), Var.6 = runif(100,-10,10))

# Sting rows into character string of columns to select

VARIABLE_STRING <- apply(VARIABLE_SELECTION,1,paste,sep = ",",collapse = " ")
VARIABLE_STRING <- gsub(" ",",",VARIABLE_STRING)
VARIABLE_STRING <- data.frame(VAR_STRING = gsub(",0","",VARIABLE_STRING))

# Will actually be part of lapply function but, one line selection for demonstration:

VARIABLE_SINGLE_SET <- as.character(VARIABLE_STRING[4,])

# Subset table for selected columns

VARIABLE_TABLE_SUB_SELECT <- VARIABLE_TABLE[,c(VARIABLE_SINGLE_SET)]

#  Error Returned:
#  Error in `[.data.frame`(VARIABLE_TABLE, , c(VARIABLE_SINGLE_SET)) : 
#  undefined columns selected

I know the text formatting is the problem but I can't find a workaround, any suggestions? 我知道文本格式是问题,但找不到解决方法,有什么建议吗?

You should avoid sub-setting by number of columns and process by variables names or at least keep your index as integer list( no need to coerce to a string) 您应该避免按列数进行子设置,而避免按变量名称进行处理,或者至少将索引保留为整数列表(无需强制转换为字符串)

First To stay in the same idea, this correct your code. 首先,要保持相同的想法,请更正您的代码。 Basciaclly I coerce your variable to vector: Basciaclly我将您的变量强制为vector:

VARIABLE_TABLE[,as.numeric(unlist(strsplit(
        VARIABLE_SINGLE_SET,',')))]

Does this give the desired result? 这会达到预期的结果吗?

lapply(VARIABLE_SELECTION, function(x) VARIABLE_TABLE[ , x[x != 0], drop = FALSE])

Produces a list where each element is a subset of 'VARIABLE_TABLE' given by 'VARIABLE_SELECTION' (using a 'VARIABLE_TABLE' with fewer rows). 产生一个列表,其中每个元素都是“ VARIABLE_SELECTION”给出的“ VARIABLE_TABLE”的子集(使用带有较少行的“ VARIABLE_TABLE”)。

# $Set.1
#       Var.3    Var.1  Var.1.1  Var.1.2  Var.1.3
# 1 0.09536403 5.593292 5.593292 5.593292 5.593292
# 2 0.09086404 6.339074 6.339074 6.339074 6.339074
# 
# $Set.2
#        Var.3    Var.2  Var.2.1  Var.2.2
# 1 0.09536403 65.81870 65.81870 65.81870
# 2 0.09086404 66.79157 66.79157 66.79157
# 
# $Set.3
#        Var.3     Var.4    Var.3.1
# 1 0.09536403 -674.6672 0.09536403
# 2 0.09086404 -576.7986 0.09086404
# 
# $Set.4
#        Var.5     Var.4
# 1  0.5155411 -674.6672
# 2 -0.9593219 -576.7986
# 
# $Set.5
#        Var.5
# 1  0.5155411
# 2 -0.9593219

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM