在 R 中按名称引用列范围

Question

I need help with something that might be fairly simple in R. I want to refer to a range of columns in a data frame (eg, extracting a few select variables).我需要一些在 R 中可能相当简单的帮助。我想引用数据框中的一系列列（例如，提取一些选择变量）。 However, I don't know their column numbers.但是，我不知道它们的列号。 Normally, if I wanted to extract columns 4-10 i would say mydata[,4:10].通常，如果我想提取第 4-10 列，我会说 mydata[,4:10]。

However, given that I don't know the column numbers, I would want to refer to them by name.但是，鉴于我不知道列号，我想按名称引用它们。 Is there an easy way to do this?有没有简单的方法来做到这一点？ in sas or spss it is fairly easy to refer to a range of variables by name.在 sas 或 spss 中，按名称引用一系列变量是相当容易的。 Alternatively, is there an easy way to figure out which column number corresponds to a variable name in R?或者，是否有一种简单的方法可以确定哪个列号对应于 R 中的变量名称？

Answer 1

Getting a range of columns can be done in several ways.可以通过多种方式获取一系列列。 subset(data.frame, select = name4:name10) , works but is quite long. subset(data.frame, select = name4:name10) ，但很长。 I used that before I got annoyed writing long commands for a simple thing.在我为一件简单的事情写长命令而感到恼火之前，我使用了它。 I made a function to tackle the naming columns / not remembering column numbers in large data frames:我做了一个函数来处理命名列/不记得大数据框中的列号：

coln <- function(X){
  y <- rbind(seq(1,ncol(X)))
  colnames(y) <- colnames(X)
rownames(y) <- "col.number"
  return(y)}

Here is how it works:这是它的工作原理：

df <- data.frame(a = 1:10, b =10:1, c = 1:10)
coln(df)
           a b c
col.number 1 2 3

Now you can call them with numbers and still look at names.现在你可以用数字给他们打电话，但仍然可以看名字。

Answer 2

A column number can be identified from a column name within a data frame as follows:列号可以从数据框中的列名中识别，如下所示：

which(colnames(mydf)=="a")

where mydf is a data frame and a is the name of the column the column number is required for.其中 mydf 是一个数据框， a 是需要列号的列的名称。

( Source ) （来源）

This can be used to create a column range:这可用于创建列范围：

firstcol = which(colnames(x)=="a")
lastcol = which(colnames(x)=="b")

mydf[c(firstcol:lastcol)]

Answer 3

Use %in% in combination with names() .将%in%与names()结合使用。 It's useful for grabbing a group of columns from a data frame.这对于从数据框中抓取一组列很有用。 You can negate the expression when you want to keep just a subset and drop the rest.当您只想保留一个子集并删除其余的时，您可以否定该表达式。 Type ?"%in%" at the R Console prompt for more details.在 R 控制台提示符下键入?"%in%"以获取更多详细信息。

set.seed(1234)
mydf <- data.frame(A = runif(5, 1, 2),
                   B = runif(5, 3, 4),
                   C = runif(5, 5, 6),
                   D = runif(5, 7, 8),
                   E = runif(5, 9, 10))
mydf

keep.cols <- c('A','D','E')
mydf[, names(mydf) %in% keep.cols]
drop.cols <- c('A','B','C')
mydf[, !names(mydf) %in% drop.cols]

The data frame:数据框：

> mydf
         A        B        C        D        E
1 1.113703 3.640311 5.693591 7.837296 9.316612
2 1.622299 3.009496 5.544975 7.286223 9.302693
3 1.609275 3.232551 5.282734 7.266821 9.159046
4 1.623379 3.666084 5.923433 7.186723 9.039996
5 1.860915 3.514251 5.292316 7.232226 9.218800

A subset of columns:列的子集：

> mydf[, names(mydf) %in% keep.cols]
         A        D        E
1 1.113703 7.837296 9.316612
2 1.622299 7.286223 9.302693
3 1.609275 7.266821 9.159046
4 1.623379 7.186723 9.039996
5 1.860915 7.232226 9.218800

Keeping a subset of columns and dropping the rest:保留列的子集并删除其余列：

> mydf[, !names(mydf) %in% drop.cols]
         D        E
1 7.837296 9.316612
2 7.286223 9.302693
3 7.266821 9.159046
4 7.186723 9.039996
5 7.232226 9.218800

Answer 4

I think I figured it out, but it's a bit ornery.我想我想通了，但它有点恼火。 Here's an example using mtcars to get the columns between hp and vs. do.call usually means there is a simpler way, though.这是使用 mtcars 获取 hp 和 vs. do.call之间的列的示例，但通常意味着有一种更简单的方法。

mtcars[do.call(seq, as.list(match(c("hp", "vs"), colnames(mtcars))))]

Answer 5

Here is a fun little function that combines the ideas behind Largh's answer with a handy function call.这是一个有趣的小函数，它将 Largh 的答案背后的想法与一个方便的函数调用相结合。 To use it, just enter要使用它，只需输入

call.cols(mydata, "firstvarname", "lastvarname") call.cols(mydata, "firstvarname", "lastvarname")

call.cols <- function(df, startvar, endvar) {
  col.num <- function(df){
    var.nums <- seq(1,ncol(df))
    names(var.nums) <- colnames(df)      
    return(var.nums)
  } 

 start.num <- as.numeric(col.num(df)[startvar])
 end.num <- as.numeric(col.num(df)[endvar])
 range.num <- start.num:end.num
 return(df[range.num]) 
}

I plan to expand this to use for scale creation for psychometric research.我计划将其扩展到用于心理测量研究的量表创建。

Answer 6

You can call the column numbers by their names:您可以按名称调用列号：

set.seed(1234)
> mydf <- data.frame(A = runif(5, 1, 2),
                     + B = runif(5, 3, 4),
                     + C = runif(5, 5, 6),
                     + D = runif(5, 7, 8),
                     + E = runif(5, 9, 10))
> mydf
mydf[c(match("A", names(mydf)):match("B", names(mydf)))]
         A        B
1 1.113703 3.640311
2 1.622299 3.009496
3 1.609275 3.232551
4 1.623379 3.666084
5 1.860915 3.514251

Here you can see that the match()-call actually gives the column number:在这里你可以看到 match() 调用实际上给出了列号：

> c(match("A", names(mydf)):match("B", names(mydf)))
[1] 1 2

I hope this is also helpful, it is similar to Neal's answer.我希望这也有帮助，它类似于尼尔的回答。

在 R 中按名称引用列范围

问题描述

6 个解决方案

解决方案1
6 2013-12-04 07:12:58

解决方案2
4 已采纳 2017-10-18 15:05:20

解决方案3
2 2013-12-04 07:30:31

解决方案4
1 2013-12-04 07:14:23

解决方案5
0 2013-12-05 04:35:09

解决方案6
0 2020-02-18 15:28:16

在 R 中按名称引用列范围

问题描述

6 个解决方案

解决方案1 6 2013-12-04 07:12:58

解决方案2 4 已采纳 2017-10-18 15:05:20

解决方案3 2 2013-12-04 07:30:31

解决方案4 1 2013-12-04 07:14:23

解决方案5 0 2013-12-05 04:35:09

解决方案6 0 2020-02-18 15:28:16

解决方案1
6 2013-12-04 07:12:58

解决方案2
4 已采纳 2017-10-18 15:05:20

解决方案3
2 2013-12-04 07:30:31

解决方案4
1 2013-12-04 07:14:23

解决方案5
0 2013-12-05 04:35:09

解决方案6
0 2020-02-18 15:28:16