简体   繁体   English

如何连接具有不同列名的多个数据框?

[英]How to concatenate multiple dataframes with different column names?

I am trying to concatenate multiple tables and let's say each of them have 20 columns but the column names are different.我正在尝试连接多个表,假设每个表都有 20 列,但列名不同。 How would I concatenate them?我将如何连接它们?

Table 1:表格1:

a <- matrix(1:6, ncol = 2, byrow = TRUE) %>%
  as.data.frame() %>%
  setNames(c("A1", "B1"))

Table 2:表 2:

b <- matrix(7:10, ncol = 2, byrow = TRUE) %>%
  as.data.frame() %>%
  setNames(c("A2", "B2"))

Expected output:预期 output:

A  B Number
1  2      1
3  4      1
5  6      1
7  8      2
9 10      2

I need to do this a the time for tables that can have hundreds of columns, this is my approach using a reference table with the standardised names under the "name" column.对于可以有数百列的表,我需要这样做,这是我使用“名称”列下具有标准化名称的参考表的方法。

For big projects I find it helpful to have the reference table in an Excel file, which is imported with readxl::read_xlsx() .对于大型项目,我发现在使用readxl::read_xlsx()导入的 Excel 文件中包含参考表很有帮助。

#' fun
#'
#' Rename data frame columns based on reference table
#' 
#' @param df_names data frame column names
#' @param df assigned name of data frame
#' @param reference data frame of column name mappings
#' 
#' @return character vector of mapped names
#'
#' @export
fun <- function(df_names, df, reference) sapply(df_names,
                                            function(x, y, d) ifelse(x %in% d[[y]], d[d[[y]] == x,]$name, x),
                                            y = df,
                                            d = reference)

reference <- data.frame(name = c("A", "B"), a = c("A1", "B1"), b = c("A2", "B2"))

names(a) <- fun(names(a), "a", reference)
names(b) <- fun(names(b), "b", reference)

a$Number <- 1
b$Number <- 2

rbind(a, b)

Maybe you can try something like below也许您可以尝试以下方法

library(dplyr)
library(tidyr)
df1$Number <- 1
df2$Number <- 2
dfout <- bind_rows(df1, df2) %>%
  unite("A", c("A1", "A2"), na.rm = TRUE) %>%
  unite("B", c("B1", "B2"), na.rm = TRUE)

which gives这使

> dfout
  A  B Number
1 1  2      1
2 3  4      1
3 5  6      1
4 7  8      2
5 9 10      2

Data数据

> dput(df1)
structure(list(A1 = c(1L, 3L, 5L), B1 = c(2L, 4L, 6L), Number = c(1, 
1, 1)), row.names = c(NA, -3L), class = "data.frame")

> dput(df2)
structure(list(A2 = c(7L, 9L), B2 = c(8L, 10L), Number = c(2, 
2)), row.names = c(NA, -2L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM