[英]R - create and name data frames from existing column values
I have a data frame that is structured like so, via dput
: 我有一个通过
dput
像这样构造的数据框:
structure(list(railroad = c("bnsf railway company", "bnsf railway company",
"bnsf railway company", "bnsf railway company", "bnsf railway company",
"bnsf railway company", "bnsf railway company", "bnsf railway company",
"union pacific railroad", "union pacific railroad", "union pacific railroad",
"union pacific railroad", "union pacific railroad", "union pacific railroad",
"union pacific railroad", "union pacific railroad"), measure =
c("cars.owned.by",
"cars.owned.by", "cars.type", "cars.type", "cars.type", "train.speed",
"train.speed", "terminal.dwell", "cars.owned.by", "cars.owned.by",
"cars.type", "cars.type", "cars.type", "train.speed", "train.speed",
"terminal.dwell"), category = c("system", "private", "box", "intermodal",
"total", "intermodal", "all.trains", "entire.railroad", "system",
"private", "box", "intermodal", "total", "intermodal", "all.trains",
"entire.railroad"), irm = c(201510L, 201510L, 201510L, 201510L,
201510L, 201510L, 201510L, 201510L, 201510L, 201510L, 201510L,
201510L, 201510L, 201510L, 201510L, 201510L), mean = c(66623,
149937.333, 11395, 16499, 236866, 33.3, 24.5, 25.267, 57618.333,
195764.667, 22229.333, 14135.333, 293164.333, 31.933, 26.6, 27.6
)), row.names = c(1L, 3L, 6L, 9L, 14L, 15L, 20L, 32L, 127L, 129L,
132L, 135L, 140L, 141L, 146L, 160L), class = "data.frame")
What I would like to do is the following: 我想做的是以下几点:
Create separate data frames for each combination of measure
and category
, named by pasting measure
and category
separated by "."
为
measure
和category
每种组合创建单独的数据框架,以粘贴measure
和category
用"."
分隔)命名"."
. 。 So the first data frame would be called
cars.owned.by.system
and so on. 因此,第一个数据帧将称为
cars.owned.by.system
,依此类推。
Rename the fifth column, mean
of each data frame to the name of the data frame itself. 重命名第五列,即每个数据帧的
mean
至数据帧本身的名称。 So, for the first data frame it would be colnames(df)[5] <- cars.owned.by.system
. 因此,对于第一个数据帧,将是
colnames(df)[5] <- cars.owned.by.system
。
The desired output is 8 separate data frames, named as I mentioned above 所需的输出是8个独立的数据帧,如上所述
I tried the following: 我尝试了以下方法:
cars.owned.by.system <- df[df$category == "system",]
colnames(cars.owned.by.system)[5[ <- cars.owned.by.system
And it does the job, but I don't want to have to do this repetitively. 它确实可以完成工作,但是我不想重复执行此操作。 I imagine there is a version of the canonical split-apply-combine approach that would work.
我想象有一种规范的“拆分应用”组合方法会起作用。 Any advice or help would be much appreciated.
任何建议或帮助将不胜感激。 Thanks.
谢谢。
Assuming df
is your dataframe, I think this does it. 假设
df
是您的数据帧,我想就可以了。
for(cat in unique(df$category)) {
newdf<-paste("cars.owned.by.", cat, sep="")
assign(newdf, df[df$category==cat,])
eval(parse(text=paste("colnames(", newdf, ")[5] <- '", newdf, "'", sep="")))
}
What about a classical for loop: 那么经典的for循环呢:
# first create the pasted name to iterate the loop
df$name <- paste(df$railroad,df$measure,sep='.')
# an empty list to have all your df
list_df <- list()
# the loop
for (i in df$name){
data <- df[which(df$name == i),] # select the df of name
colnames(data)[4]<-i # rename the mean
data<- data[,-5] # remove the useless name
list_df[[i]] <- data # store in list
}
# here you can see all the df in a list
list_df
> list_df
$`bnsf railway company.cars.owned.by`
railroad measure category bnsf railway company.cars.owned.by name
1 bnsf railway company cars.owned.by system 201510 bnsf railway company.cars.owned.by
3 bnsf railway company cars.owned.by private 201510 bnsf railway company.cars.owned.by
$`bnsf railway company.cars.type`
railroad measure category bnsf railway company.cars.type name
6 bnsf railway company cars.type box 201510 bnsf railway company.cars.type
9 bnsf railway company cars.type intermodal 201510 bnsf railway company.cars.type
14 bnsf railway company cars.type total 201510 bnsf railway company.cars.type
... and so on
# you can select each df, for example choosin its name
list_df$`bnsf railway company.cars.type`
railroad measure category bnsf railway company.cars.type name
6 bnsf railway company cars.type box 201510 bnsf railway company.cars.type
9 bnsf railway company cars.type intermodal 201510 bnsf railway company.cars.type
14 bnsf railway company cars.type total 201510 bnsf railway company.cars.type
# and you're sure it's a df
class(list_df$`bnsf railway company.cars.type`)
[1] "data.frame"
Consider split
to subset data frame by the two factors and then Map
(wrapper to mapply
) to iterate elementwise through subsetted data frames and list's names. 考虑通过两个因素将数据
split
为子集数据帧,然后使用Map
(将包装器Map
为mapply
)通过子集数据帧和列表名称mapply
进行迭代。
Also, consider setNames()
the left-hand version of colnames()
to return the new named object in one call. 另外,考虑
setNames()
是colnames()
的左手版本, colnames()
在一次调用中返回新的命名对象。
# CREATES NAMED LIST
df_list <- split(df, list(df$measure, df$category))
# RETURNS SAME LIST WITH RENAMED FIFTH COLUMN
df_list <- Map(function(sub, nm) setNames(sub, c("railroad", "measure", "category", "irm", nm)),
df_list, names(df_list))
# OUTPUT DFs
df_list$cars.owned.by.all.trains
df_list$cars.type.all.trains
df_list$terminal.dwell.all.trains
...
This will give you a named list of dataframes, which is almost certainly preferable to having them all separately in your global environment: 这将为您提供一个命名的数据帧列表,几乎可以肯定比在全局环境中将它们全部分开更可取:
lst <- split(df, paste(df$measure, df$category, sep = ".")) %>%
purrr::imap(~`names<-`(.x, c(names(.x)[1:4], .y)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.