简体   繁体   English

R-从现有列值创建和命名数据框

[英]R - create and name data frames from existing column values

I have a data frame that is structured like so, via dput : 我有一个通过dput像这样构造的数据框:

structure(list(railroad = c("bnsf railway company", "bnsf railway company", 
"bnsf railway company", "bnsf railway company", "bnsf railway company", 
"bnsf railway company", "bnsf railway company", "bnsf railway company", 
"union pacific railroad", "union pacific railroad", "union pacific railroad", 
"union pacific railroad", "union pacific railroad", "union pacific railroad", 
"union pacific railroad", "union pacific railroad"), measure = 
c("cars.owned.by", 
"cars.owned.by", "cars.type", "cars.type", "cars.type", "train.speed", 
"train.speed", "terminal.dwell", "cars.owned.by", "cars.owned.by", 
"cars.type", "cars.type", "cars.type", "train.speed", "train.speed", 
"terminal.dwell"), category = c("system", "private", "box", "intermodal", 
"total", "intermodal", "all.trains", "entire.railroad", "system", 
"private", "box", "intermodal", "total", "intermodal", "all.trains", 
"entire.railroad"), irm = c(201510L, 201510L, 201510L, 201510L, 
201510L, 201510L, 201510L, 201510L, 201510L, 201510L, 201510L, 
201510L, 201510L, 201510L, 201510L, 201510L), mean = c(66623, 
149937.333, 11395, 16499, 236866, 33.3, 24.5, 25.267, 57618.333, 
195764.667, 22229.333, 14135.333, 293164.333, 31.933, 26.6, 27.6
)), row.names = c(1L, 3L, 6L, 9L, 14L, 15L, 20L, 32L, 127L, 129L, 
132L, 135L, 140L, 141L, 146L, 160L), class = "data.frame")

What I would like to do is the following: 我想做的是以下几点:

  1. Create separate data frames for each combination of measure and category , named by pasting measure and category separated by "." measurecategory每种组合创建单独的数据框架,以粘贴measurecategory"."分隔)命名"." . So the first data frame would be called cars.owned.by.system and so on. 因此,第一个数据帧将称为cars.owned.by.system ,依此类推。

  2. Rename the fifth column, mean of each data frame to the name of the data frame itself. 重命名第五列,即每个数据帧的mean至数据帧本身的名称。 So, for the first data frame it would be colnames(df)[5] <- cars.owned.by.system . 因此,对于第一个数据帧,将是colnames(df)[5] <- cars.owned.by.system

The desired output is 8 separate data frames, named as I mentioned above 所需的输出是8个独立的数据帧,如上所述

I tried the following: 我尝试了以下方法:

cars.owned.by.system <- df[df$category == "system",]
colnames(cars.owned.by.system)[5[ <- cars.owned.by.system

And it does the job, but I don't want to have to do this repetitively. 它确实可以完成工作,但是我不想重复执行此操作。 I imagine there is a version of the canonical split-apply-combine approach that would work. 我想象有一种规范的“拆分应用”组合方法会起作用。 Any advice or help would be much appreciated. 任何建议或帮助将不胜感激。 Thanks. 谢谢。

Assuming df is your dataframe, I think this does it. 假设df是您的数据帧,我想就可以了。

for(cat in unique(df$category)) {
  newdf<-paste("cars.owned.by.", cat, sep="")
  assign(newdf, df[df$category==cat,])
  eval(parse(text=paste("colnames(", newdf, ")[5] <- '", newdf, "'", sep="")))
}

What about a classical for loop: 那么经典的for循环呢:

# first create the pasted name to iterate the loop 
df$name <- paste(df$railroad,df$measure,sep='.')

# an empty list to have all your df
list_df <- list()

# the loop
for (i in df$name){
data <- df[which(df$name == i),]  # select the df of name
colnames(data)[4]<-i              # rename the mean
data<- data[,-5]                  # remove the useless name
list_df[[i]] <- data              # store in list
}

# here you can see all the df in a list
list_df

> list_df
$`bnsf railway company.cars.owned.by`
              railroad       measure category bnsf railway company.cars.owned.by                               name
1 bnsf railway company cars.owned.by   system                             201510 bnsf railway company.cars.owned.by
3 bnsf railway company cars.owned.by  private                             201510 bnsf railway company.cars.owned.by

$`bnsf railway company.cars.type`
               railroad   measure   category bnsf railway company.cars.type                           name
6  bnsf railway company cars.type        box                         201510 bnsf railway company.cars.type
9  bnsf railway company cars.type intermodal                         201510 bnsf railway company.cars.type
14 bnsf railway company cars.type      total                         201510 bnsf railway company.cars.type
... and so on  

# you can select each df, for example choosin its name
list_df$`bnsf railway company.cars.type`
                    railroad   measure   category bnsf railway company.cars.type                           name
6  bnsf railway company cars.type        box                         201510 bnsf railway company.cars.type
9  bnsf railway company cars.type intermodal                         201510 bnsf railway company.cars.type
14 bnsf railway company cars.type      total                         201510 bnsf railway company.cars.type

# and you're sure it's a df
class(list_df$`bnsf railway company.cars.type`)
[1] "data.frame"

Consider split to subset data frame by the two factors and then Map (wrapper to mapply ) to iterate elementwise through subsetted data frames and list's names. 考虑通过两个因素将数据split为子集数据帧,然后使用Map (将包装器Mapmapply )通过子集数据帧和列表名称mapply进行迭代。

Also, consider setNames() the left-hand version of colnames() to return the new named object in one call. 另外,考虑setNames()colnames()的左手版本, colnames()在一次调用中返回新的命名对象。

# CREATES NAMED LIST
df_list <- split(df, list(df$measure, df$category))

# RETURNS SAME LIST WITH RENAMED FIFTH COLUMN
df_list <- Map(function(sub, nm) setNames(sub, c("railroad", "measure", "category", "irm", nm)), 
               df_list, names(df_list))

# OUTPUT DFs 
df_list$cars.owned.by.all.trains

df_list$cars.type.all.trains

df_list$terminal.dwell.all.trains 
...

This will give you a named list of dataframes, which is almost certainly preferable to having them all separately in your global environment: 这将为您提供一个命名的数据帧列表,几乎可以肯定比在全局环境中将它们全部分开更可取:

lst <- split(df, paste(df$measure, df$category, sep = ".")) %>% 
  purrr::imap(~`names<-`(.x, c(names(.x)[1:4], .y)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM