如何将数据框分成R中有关列名的数据帧列表？

Question

假设我有以下数据帧：

df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10),
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10))

我想创建一个数据框列表，用列名的第一部分将它们分开，即以“BR”开头的列将是列表的一个元素，以“USA”开头的列将是另一个，等等。

我能够获取列名并使用strsplit将它们strsplit 。 但是我不确定如何迭代它并分离数据帧的最佳方法。

strsplit(names(df), "\\.")

给我一个列表，其中顶级元素是列的名称，第二个级别是由"."拆分的相同"." 。

我如何迭代此列表以获取以相同子字符串开头的列的索引号，并将这些列分组为另一个列表的元素？

Answer 1

这仅在列名始终采用您拥有它们的形式（基于“。”拆分）并且您希望在第一个“。”之前基于标识符进行分组时才有效。

df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10),
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10))

## Grab the component of the names we want
nm <- do.call(rbind, strsplit(colnames(df), "\\."))[,1]
## Create list with custom function using lapply
datlist <- lapply(unique(nm), function(x){df[, nm == x]})

Answer 2

Dason打败了我，但这是同一概念方法的不同风格：

library(plyr)

# Use regex to get the prefixes
# Pulls any letters or digits ("\\w*") from the beginning of the string ("^")
# to the first period ("\\.") into a group, then matches all the remaining
# characters (".*").  Then replaces with the first group ("\\1" = "(\\w*)").
# In other words, it matches the whole string but replaces with only the prefix.

prefixes <- unique(gsub(pattern = "^(\\w*)\\..*",
                        replace = "\\1",
                        x = names(df)))

# Subset to the variables that match the prefix
# Iterates over the prefixes and subsets based on the variable names that
# match that prefix
llply(prefixes, .fun = function(x){
    y <- subset(df, select = names(df)[grep(names(df),
                                            pattern = paste("^", x, sep = ""))])
})

我认为这些正则表达式仍应该给你正确的结果，即使有“。” 后来变量名称：

unique(gsub(pattern = "^(\\w*)\\..*",
            replace = "\\1",
            x = c(names(df), "FRA.c.blahblah")))

或者，如果稍后在变量名称中出现前缀：

# Add a USA variable with "FRA" in it
df2 <- data.frame(df, USA.FRANKLINS = rnorm(10))

prefixes2 <- unique(gsub(pattern = "^(\\w*)\\..*",
                        replace = "\\1",
                        x = names(df2)))

llply(prefixes2, .fun = function(x){
    y <- subset(df2, select = names(df2)[grep(names(df2),
                                            pattern = paste("^", x, sep = ""))])
})

如何将数据框分成R中有关列名的数据帧列表？

问题描述

2 个解决方案

解决方案1
3 2012-02-14 17:12:00

解决方案2
3 已采纳 2012-02-14 17:13:39

如何将数据框分成R中有关列名的数据帧列表？

问题描述

2 个解决方案

解决方案1 3 2012-02-14 17:12:00

解决方案2 3 已采纳 2012-02-14 17:13:39

解决方案1
3 2012-02-14 17:12:00

解决方案2
3 已采纳 2012-02-14 17:13:39