简体   繁体   English

R:如何为每个因子组合创建一个带有观察值的数据框

[英]R: How to create a data frame with one observation for each combination of factors

I hope all is well. 我希望一切都好。 I am writing with regards to a very specific question in R to which I so far was not able to find a solution online. 我正在写有关R中一个非常具体的问题的信息,到目前为止我还无法在线找到解决方案。 If the question has already been addressed somewhere else, I am sorry for bothering you but would appreciate if you could provide me with the link. 如果问题已经在其他地方解决了,很抱歉打扰您,但是如果您能给我提供链接,我们将不胜感激。

I have 3 separate data sets: 我有3个独立的数据集:

The first one is a list of companies. 第一个是公司列表。 The second one is a list of years. 第二个是年份列表。 The third one is a list of countries. 第三个是国家列表。

My objective is now to merge these 3 data sets into a new data frame. 我现在的目标是将这三个数据集合并到一个新的数据框中。 The final data frame should create a data row for each combination of these 3 variables . 最终数据帧应为这3个变量的每个组合创建一个数据行 This is the reason why I cannot use the merge() function. 这就是为什么我不能使用merge()函数的原因。 As a next step, I want to match data along this newly created data frame. 下一步,我想沿着这个新创建的数据框匹配数据。

Thank you ever you much for your support - and again sorry if the question has already been addressed elsewhere! 非常感谢您的支持-如果问题已经在其他地方解决,再次表示抱歉!

Try merge : 尝试merge

A <- data.frame(Companies = LETTERS[1:3])
B <- data.frame(Years = 2000:2002)
C <- data.frame(Countries = c("GER", "UK", "US"))

X <- merge(merge(A, B), C)
X

   Companies Years Countries
1          A  2000       GER
2          B  2000       GER
3          C  2000       GER
4          A  2001       GER
5          B  2001       GER
6          C  2001       GER
7          A  2002       GER
8          B  2002       GER
9          C  2002       GER
10         A  2000        UK
...

If you have more than 3 variables/factors you could write your own merge function like this: 如果您有3个以上的变量/因数,则可以编写自己的合并函数,如下所示:

mergeN <- function(dfs = NULL) {
  if(is.null(dfs) | class(dfs) != "list") stop("'dfs' is not a list!")
  if(length(dfs) > 1) {
    dfs[[1]] <- merge(dfs[[1]], dfs[[2]]) 
    dfs[[2]] <- NULL
    Recall(dfs)
  } else {
    return(dfs[[1]]) 
  }
}

D <- data.frame(Products = letters[24:26])
E <- data.frame(Divisions = c(100,200,300))

mergeN(list(A, B, C, D, E))

This will give you a dataframe of all 3^5 = 243 combinations. 这将为您提供所有3 ^ 5 = 243个组合的数据框。

Update due to comments: 由于评论而更新:

A <- data.frame(Companies = LETTERS[1:3])
B <- data.frame(Years = 2000:2002)
C <- data.frame(Countries = c("GER", "UK", "US"))

X <- merge(merge(A, B), C)

Y <- data.frame(Companies = LETTERS[1:3], Years = rep(2000,3), Countries = c("GER", "UK", "US"), Revenues = c(20433,23255,32164))

merge(X, Y, all=T)

     Companies Years Countries Revenues
1          A  2000       GER    20433
2          A  2000        UK       NA
3          A  2000        US       NA
4          A  2001       GER       NA
5          A  2001        UK       NA
6          A  2001        US       NA
7          A  2002       GER       NA
8          A  2002        UK       NA
9          A  2002        US       NA
10         B  2000       GER       NA
11         B  2000        UK    23255
12         B  2000        US       NA
13         B  2001       GER       NA
14         B  2001        UK       NA
15         B  2001        US       NA
16         B  2002       GER       NA
17         B  2002        UK       NA
18         B  2002        US       NA
19         C  2000       GER       NA
20         C  2000        UK       NA
21         C  2000        US    32164
22         C  2001       GER       NA
23         C  2001        UK       NA
24         C  2001        US       NA
25         C  2002       GER       NA
26         C  2002        UK       NA
27         C  2002        US       NA

(If you want NA's to be zero: Z[is.na(Z)] <- 0 ) (如果您希望NA为零: Z[is.na(Z)] <- 0

Borrowing input data frames from @Martin, here's an approach that involves placing all your data frames in a list , and then using Reduce() : 从@Martin借入输入数据帧,这是一种将所有数据帧放置在list ,然后使用Reduce()

d1 <- data.frame(Companies = LETTERS[1:3])
d2 <- data.frame(Years = 2000:2002)
d3 <- data.frame(Countries = c("GER", "UK", "US"))
d4 <- data.frame(Companies = LETTERS[1:3], Years = rep(2000,3), Countries = c("GER", "UK", "US"), Revenues = c(20433,23255,32164))

d <- list(d1, d2, d3, d4)
merged_dat <- Reduce(function(...) merge(..., all=T), d)
head(merged_dat)
#>   Companies Years Countries Revenues
#> 1         A  2000       GER    20433
#> 2         A  2000        UK       NA
#> 3         A  2000        US       NA
#> 4         A  2001       GER       NA
#> 5         A  2001        UK       NA
#> 6         A  2001        US       NA

I prefer this because it generalises to as many data frames as you might have. 我之所以喜欢它,是因为它可以泛化到尽可能多的数据帧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM