从数据框中提取特定列

Question

I have an R data frame with 6 columns, and I want to create a new dataframe that only has three of the columns.我有一个包含 6 列的 R 数据框，我想创建一个只有三列的新 dataframe。

Assuming my data frame is df , and I want to extract columns A , B , and E , this is the only command I can figure out:假设我的数据框是df ，我想提取列A 、 B和E ，这是我能弄清楚的唯一命令：

 data.frame(df$A,df$B,df$E)

Is there a more compact way of doing this?有没有更紧凑的方法来做到这一点？

Answer 1

You can subset using a vector of column names.您可以使用列名向量进行子集化。 I strongly prefer this approach over those that treat column names as if they are object names (eg subset() ), especially when programming in functions, packages, or applications.与那些将列名称视为 object 名称（例如subset() ）的方法相比，我更喜欢这种方法，尤其是在函数、包或应用程序中编程时。

# data for reproducible example
# (and to avoid confusion from trying to subset `stats::df`)
df <- setNames(data.frame(as.list(1:5)), LETTERS[1:5])
# subset
df[c("A","B","E")]

Note there's no comma (ie it's not df[,c("A","B","C")] ).请注意没有逗号（即它不是df[,c("A","B","C")] ）。 That's because df[,"A"] returns a vector, not a data frame.那是因为df[,"A"]返回一个向量，而不是数据框。 But df["A"] will always return a data frame.但是df["A"]总是会返回一个数据框。

str(df["A"])
## 'data.frame':    1 obs. of  1 variable:
## $ A: int 1
str(df[,"A"])  # vector
##  int 1

Thanks to David Dorchies for pointing out that df[,"A"] returns a vector instead of a data.frame, and to Antoine Fabri for suggesting a better alternative (above) to my original solution (below).感谢David Dorchies指出df[,"A"]返回一个向量而不是 data.frame，感谢Antoine Fabri为我的原始解决方案（下图）提出了更好的替代方案（上图）。

# subset (original solution--not recommended)
df[,c("A","B","E")]  # returns a data.frame
df[,"A"]             # returns a vector

Answer 2

Using the dplyr package, if your data.frame is called df1 :使用dplyr package，如果你的 data.frame 被称为df1 ：

library(dplyr)

df1 %>%
  select(A, B, E)

This can also be written without the %>% pipe as:这也可以在没有%>% pipe 的情况下写成：

select(df1, A, B, E)

Answer 3

This is the role of the subset() function:这是subset() function的作用：

> dat <- data.frame(A=c(1,2),B=c(3,4),C=c(5,6),D=c(7,7),E=c(8,8),F=c(9,9)) 
> subset(dat, select=c("A", "B"))
  A B
1 1 3
2 2 4

Answer 4

There are two obvious choices: Joshua Ulrich's df[,c("A","B","E")] or有两个明显的选择：Joshua Ulrich 的df[,c("A","B","E")]或

df[,c(1,2,5)]

as in如在

> df <- data.frame(A=c(1,2),B=c(3,4),C=c(5,6),D=c(7,7),E=c(8,8),F=c(9,9)) 
> df
  A B C D E F
1 1 3 5 7 8 9
2 2 4 6 7 8 9
> df[,c(1,2,5)]
  A B E
1 1 3 8
2 2 4 8
> df[,c("A","B","E")]
  A B E
1 1 3 8
2 2 4 8

Answer 5

Where df1 is your original data frame:其中 df1 是您的原始数据框：

df2 <- subset(df1, select = c(1, 2, 5))

Answer 6

For some reason only仅出于某种原因

df[, (names(df) %in% c("A","B","E"))]

worked for me.为我工作。 All of the above syntaxes yielded "undefined columns selected".所有上述语法都产生了“未定义的列选择”。

Answer 7

You can also use the sqldf package which performs selects on R data frames as:您还可以使用sqldf package 对 R 数据帧执行选择，如下所示：

df1 <- sqldf("select A, B, E from df")

This gives as the output a data frame df1 with columns: A, B,E.这给出了 output 的数据框df1 ，列为：A、B、E。

Answer 8

You can use with :您可以with ：

with(df, data.frame(A, B, E))

Answer 9

df<- dplyr::select ( df,A,B,C)

Also, you can assign a different name to the newly created data此外，您可以为新创建的数据分配不同的名称

data<- dplyr::select ( df,A,B,C)

Answer 10

[ and subset are not substitutable: [和 subset 不可替代：

[ does return a vector if only one column is selected. [如果只选择一列，则返回一个向量。

df = data.frame(a="a",b="b")    

identical(
  df[,c("a")], 
  subset(df,select="a")
) 

identical(
  df[,c("a","b")],  
  subset(df,select=c("a","b"))
)

从数据框中提取特定列

问题描述

10 个解决方案

解决方案1
507 2012-04-10 02:44:34

解决方案2
239 已采纳 2015-04-19 21:19:17

解决方案3
110 2012-04-10 09:50:05

解决方案4
86 2012-04-10 06:49:54

解决方案5
21 2016-06-10 11:34:19

解决方案6
21 2017-10-12 18:12:23

解决方案7
15 2016-11-30 08:00:22

解决方案8
4 2019-05-22 09:49:02

解决方案9
1 2019-10-15 19:54:27

解决方案10
0 2016-11-09 15:32:24

从数据框中提取特定列

问题描述

10 个解决方案

解决方案1 507 2012-04-10 02:44:34

解决方案2 239 已采纳 2015-04-19 21:19:17

解决方案3 110 2012-04-10 09:50:05

解决方案4 86 2012-04-10 06:49:54

解决方案5 21 2016-06-10 11:34:19

解决方案6 21 2017-10-12 18:12:23

解决方案7 15 2016-11-30 08:00:22

解决方案8 4 2019-05-22 09:49:02

解决方案9 1 2019-10-15 19:54:27

解决方案10 0 2016-11-09 15:32:24

解决方案1
507 2012-04-10 02:44:34

解决方案2
239 已采纳 2015-04-19 21:19:17

解决方案3
110 2012-04-10 09:50:05

解决方案4
86 2012-04-10 06:49:54

解决方案5
21 2016-06-10 11:34:19

解决方案6
21 2017-10-12 18:12:23

解决方案7
15 2016-11-30 08:00:22

解决方案8
4 2019-05-22 09:49:02

解决方案9
1 2019-10-15 19:54:27

解决方案10
0 2016-11-09 15:32:24