简体   繁体   English

如何通过提供有关列名的条件来创建 DataFrame 的子集

[英]How to create a subset of a DataFrame by giving a condition regarding the column names

Sorry if it is a very simple question, I'm new at programming.抱歉,如果这是一个非常简单的问题,我是编程新手。 I want to create a subset of a DataFrame (eclipse dataset) by using specific column names.我想通过使用特定的列名来创建 DataFrame(eclipse 数据集)的子集。 However, since there are 212 columns in total, and I need 41 of them, writing every single of the column names as a list would be too long (and not a nice way to code I suppose).但是,由于总共有 212 列,而我需要其中的 41 列,因此将每个列名写成一个列表会太长(我想这不是一种好的编码方式)。 So instead I decided to get the columns by specifying the beginning of the column names (which decreases the list to 15 elements).因此,我决定通过指定列名的开头来获取列(这会将列表减少到 15 个元素)。 I have column names that start with specific letters such as "NOF", "NOM", "NSF", etc. and I want to extract the columns starting with these strings to create my new subset.我有以特定字母开头的列名,例如“NOF”、“NOM”、“NSF”等,我想提取以这些字符串开头的列来创建我的新子集。 I tried to run the code below:我试图运行下面的代码:

eclipse_train <- subset(eclipse, select = starts_with(predictors))

Where the predictors is a list of words that I want the columns to start with.预测变量是我希望列开始的单词列表。 But of course, it gave the error:但当然,它给出了错误:

Error in starts_with(predictors): is_string(match) is not TRUE starts_with(predictors) 中的错误:is_string(match) 不是 TRUE

I could not come up with anything else to filter the columns that start with specific strings I wanted to create a subset.我想不出任何其他方法来过滤以我想创建子集的特定字符串开头的列。 How can I implement such a thing?我怎样才能实现这样的事情?

Assuming the eclipse data frame in the Note, use grep to find the indices of the names that start with the indicated strings and subscript by those indices.假设注释中的eclipse数据框,使用grep查找以指示的字符串开头的名称的索引,并由这些索引下标。 No packages are used.不使用任何包。

eclipse[ grep("^(NOF|NOM|NSF)", names(eclipse)) ]

giving:给予:

  NOFX NOMX NSFX
1    2    3    4

Note笔记

If the desired columns were contiguous, as in the example in the Note, then this would also work where we specify the first and last name.如果所需的列是连续的,如注释中的示例,那么这也适用于我们指定名字和姓氏的地方。

subset(eclipse, select = NOFX:NSFX)

giving the same result.给出相同的结果。

Note笔记

nms <- c("A", paste0(c("NOF", "NOM", "NSF"), "X"), "B")
eclipse <- as.data.frame.list(setNames(seq_along(nms), nms))

which looks liek this:看起来像这样:

> eclipse
  A NOFX NOMX NSFX B
1 1    2    3    4 5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM