[英]Extracting specific columns from a data frame
I have an R data frame with 6 columns, and I want to create a new dataframe that only has three of the columns.我有一个包含 6 列的 R 数据框,我想创建一个只有三列的新 dataframe。
Assuming my data frame is df
, and I want to extract columns A
, B
, and E
, this is the only command I can figure out:假设我的数据框是
df
,我想提取列A
、 B
和E
,这是我能弄清楚的唯一命令:
data.frame(df$A,df$B,df$E)
Is there a more compact way of doing this?有没有更紧凑的方法来做到这一点?
You can subset using a vector of column names.您可以使用列名向量进行子集化。 I strongly prefer this approach over those that treat column names as if they are object names (eg
subset()
), especially when programming in functions, packages, or applications.与那些将列名称视为 object 名称(例如
subset()
)的方法相比,我更喜欢这种方法,尤其是在函数、包或应用程序中编程时。
# data for reproducible example
# (and to avoid confusion from trying to subset `stats::df`)
df <- setNames(data.frame(as.list(1:5)), LETTERS[1:5])
# subset
df[c("A","B","E")]
Note there's no comma (ie it's not df[,c("A","B","C")]
).请注意没有逗号(即它不是
df[,c("A","B","C")]
)。 That's because df[,"A"]
returns a vector, not a data frame.那是因为
df[,"A"]
返回一个向量,而不是数据框。 But df["A"]
will always return a data frame.但是
df["A"]
总是会返回一个数据框。
str(df["A"])
## 'data.frame': 1 obs. of 1 variable:
## $ A: int 1
str(df[,"A"]) # vector
## int 1
Thanks to David Dorchies for pointing out that df[,"A"]
returns a vector instead of a data.frame, and to Antoine Fabri for suggesting a better alternative (above) to my original solution (below).感谢David Dorchies指出
df[,"A"]
返回一个向量而不是 data.frame,感谢Antoine Fabri为我的原始解决方案(下图)提出了更好的替代方案(上图)。
# subset (original solution--not recommended)
df[,c("A","B","E")] # returns a data.frame
df[,"A"] # returns a vector
There are two obvious choices: Joshua Ulrich's df[,c("A","B","E")]
or有两个明显的选择:Joshua Ulrich 的
df[,c("A","B","E")]
或
df[,c(1,2,5)]
as in如在
> df <- data.frame(A=c(1,2),B=c(3,4),C=c(5,6),D=c(7,7),E=c(8,8),F=c(9,9))
> df
A B C D E F
1 1 3 5 7 8 9
2 2 4 6 7 8 9
> df[,c(1,2,5)]
A B E
1 1 3 8
2 2 4 8
> df[,c("A","B","E")]
A B E
1 1 3 8
2 2 4 8
Where df1 is your original data frame:其中 df1 是您的原始数据框:
df2 <- subset(df1, select = c(1, 2, 5))
For some reason only仅出于某种原因
df[, (names(df) %in% c("A","B","E"))]
worked for me.为我工作。 All of the above syntaxes yielded "undefined columns selected".
所有上述语法都产生了“未定义的列选择”。
You can also use the sqldf
package which performs selects on R data frames as:您还可以使用
sqldf
package 对 R 数据帧执行选择,如下所示:
df1 <- sqldf("select A, B, E from df")
This gives as the output a data frame df1
with columns: A, B,E.这给出了 output 的数据框
df1
,列为:A、B、E。
You can use with
:您可以
with
:
with(df, data.frame(A, B, E))
df<- dplyr::select ( df,A,B,C)
Also, you can assign a different name to the newly created data此外,您可以为新创建的数据分配不同的名称
data<- dplyr::select ( df,A,B,C)
[
and subset are not substitutable: [
和 subset 不可替代:
[
does return a vector if only one column is selected. [
如果只选择一列,则返回一个向量。
df = data.frame(a="a",b="b")
identical(
df[,c("a")],
subset(df,select="a")
)
identical(
df[,c("a","b")],
subset(df,select=c("a","b"))
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.