简体   繁体   English

跨多个数据框使用“应用”功能

[英]Using “apply” functions across multiple data frames

I'm having an issue using apply functions (which I assume is the right way to do the following) across multiple data frames. 我在多个数据帧之间使用Apply函数(我认为这是执行以下操作的正确方法)时遇到问题。

Some example data (3 different data frames, but the problem I'm working on has upwards of 50): 一些示例数据(3个不同的数据帧,但我正在处理的问题超过50个):

biz <- data.frame(
    country = c("england","canada","australia","usa"),
    businesses = sample(1000:2500,4))

pop <- data.frame(
    country = c("england","canada","australia","usa"),
    population = sample(10000:20000,4))

restaurants <- data.frame(
    country = c("england","canada","australia","usa"),
    restaurants = sample(500:1000,4))

Here's what I ultimately want to do: 这是我最终想要做的:

1) Sort eat data frame from largest to smallest, according to the variable that's included 1)根据所包含的变量,从最大到最小对饮食数据帧进行排序

dataframe <- dataframe[order(dataframe$VARIABLE,)]

2) then create a vector variable that gives me the rank for each 2)然后创建一个向量变量,该变量为我提供每个变量的排名

dataframe$rank <- 1:nrow(dataframe)

3) Then create another data frame that has one column of the countries and the rank for each of the variables of interest as other columns. 3)然后创建另一个数据框,该数据框的一列国家/地区和每个感兴趣变量的排名与其他列一样。 Something that would look like (rankings aren't real here): 看起来像(这里的排名不是真实的):

country.rankings <- structure(list(country = structure(c(5L, 1L, 6L, 2L, 3L, 4L), .Label = c("brazil", 
"canada", "england", "france", "ghana", "usa"), class = "factor"), 
    restaurants = 1:6, businesses = c(4L, 5L, 6L, 3L, 2L, 1L), 
    population = c(4L, 6L, 3L, 2L, 5L, 1L)), .Names = c("country", 
"restaurants", "businesses", "population"), class = "data.frame", row.names = c(NA, 
-6L))

So I'm guessing there's a way to put each of these data frames together into a list, something like: 所以我猜想有一种方法可以将这些数据帧中的每一个都放到一个列表中,例如:

lib <- c(biz, pop, restaurants)

And then do an lapply across that to 1) sort, 2)create the rank variable and 3) create the matrix or data frame of rankings for each variable (# of businesses, population size, # of restaurants) for each country. 然后对其进行一次不适用于1)排序,2)创建等级变量和3)为每个国家/地区的每个变量(企业数量,人口规模,餐厅数量)创建排名矩阵或数据框架。 Problem I'm running into is that writing the lapply function to sort each data frame runs into issues when I try to order by the variable: 我遇到的问题是,当我尝试按变量排序时,编写lapply函数对每个数据帧进行排序会遇到问题:

sort <- lapply(lib, 
    function(x){
        x <- x[order(x[,2]),]
        })

returns the error message: 返回错误信息:

Error in `[.default`(x, , 2) : incorrect number of dimensions

because I'm trying to apply column headings to a list. 因为我正在尝试将列标题应用于列表。 But how else would I tackle this problem when the variable names are different for every data frame (but keeping in mind that the country names are consistent) 但是,当每个数据框的变量名称都不同时,我还要如何解决这个问题(但要记住,国家名称是一致的)

(would also love to know how to use this using plyr ) (也很想知道如何使用plyr来使用它)

Ideally I'd would recommend data.table for this. 理想情况下,我会为此推荐data.table However, here is a quick solution using data.frame Try this: 但是,这是使用data.frame的快速解决方案,请尝试以下操作:

Step1: Create a list of all data.frames 步骤1:创建所有data.frames的列表

varList <- list(biz,pop,restaurants) 

Step2: Combine all of them in one data.frame 步骤2:将所有内容合并到一个data.frame中

temp <- varList[[1]]
for(i in 2:length(varList))  temp <- merge(temp,varList[[i]],by = "country")

Step3: Get ranks: 第三步:获得排名:

cbind(temp,apply(temp[,-1],2,rank))

You can remove the undesired columns if you want!! 您可以根据需要删除不想要的列!

cbind(temp[,1:2],apply(temp[,-1],2,rank))[,-2]

Hope this helps!! 希望这可以帮助!!

totaldatasets <- c('biz','pop','restaurants')
totaldatasetslist <- vector(mode = "list",length = length(totaldatasets))
for ( i in seq(length(totaldatasets)))
{
  totaldatasetslist[[i]]  <- get(totaldatasets[i])
}

totaldatasetslist2 <- lapply(
  totaldatasetslist,
  function(x)
  {
    temp <- data.frame(
      country = totaldatasetslist[[i]][,1],
      countryrank  = rank(totaldatasetslist[[i]][,2])
    )

    colnames(temp) <- c('country', colnames(x)[2])

    return(temp)
  }
    )


Reduce(
  merge,
  totaldatasetslist2
)

Output - 输出-

    country businesses population restaurants
1 australia          3          3           3
2    canada          2          2           2
3   england          1          1           1
4       usa          4          4           4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM