将一些csv文件合并为一个不同数量的列

Question

I already loaded 20 csv files with function: 我已经加载了20个具有功能的csv文件：

tbl = list.files(pattern="*.csv")
for (i in 1:length(tbl)) assign(tbl[i], read.csv(tbl[i]))

or 要么

list_of_data = lapply(tbl, read.csv)

That how it looks like: 它看起来如何：

> head(tbl)
[1] "F1.csv"          "F10_noS3.csv"    "F11.csv"         "F12.csv"         "F12_noS7_S8.csv"
[6] "F13.csv"

I have to combine all of those files into one. 我必须将所有这些文件合并为一个。 Let's call it a master file but let's try with making a one table with all of the names. 我们将其称为主文件，但让我们尝试制作一个包含所有名称的表。 In all of those csv files is a column called "Accession". 在所有这些csv文件中都有一个名为“Accession”的列。 I would like to make a table of all "names" from all of those csv files. 我想从所有这些csv文件中创建一个包含所有“名称”的表。 Of course many of the accessions can be repeated in different csv files. 当然，许多种质可以在不同的csv文件中重复。 I would like to keep all of the data corresponding to the accession. 我想保留所有与加入相对应的数据。

Some problems: 一些问题：

Some of those "names" are the same and I don't want to duplicate them 其中一些“名称”是相同的，我不想复制它们
Some of those "names" are ALMOST the same. 其中一些“名称”几乎相同。 The difference is that there is name and after become the dot and the numer. 不同的是，有名称后成为点和数字。
The number of columns can be different is those csv files. 列数可以不同是那些csv文件。

That's the screenshot showing how those data looks like: http://imageshack.com/a/img811/7103/29hg.jpg 这是显示这些数据的截图： http ： //imageshack.com/a/img811/7103/29hg.jpg

Let me show you how it looks: 让我告诉你它的外观：

AT3G26450.1 <--
AT5G44520.2
AT4G24770.1
AT2G37220.2
AT3G02520.1
AT5G05270.1
AT1G32060.1
AT3G52380.1
AT2G43910.2
AT2G19760.1
AT3G26450.2 <--

<-- = Same sample, different names. <-- = 相同的样本，不同的名称。 Should be treated as one. 应该被视为一个。 So just ignore dot and a number after. 所以只需忽略点和数字。

Is it possible to do ? 有可能吗？

I couldn't do a dput(head) because it's even too big data set. 我不能做一个dput(head)因为它甚至是太大的数据集。

I tried to use such code: 我试着使用这样的代码：

all_data = do.call(rbind, list_of_data)
Error in rbind(deparse.level, ...) : 
The number of columns is not correct.


all_data$CleanedAccession = str_extract(all_data$Accession, "^[[:alnum:]]+")
all_data = subset(all_data, !duplicated(CleanedAccession))

I tried to do it for almost 2 weeks and I am not able to. 我试着做了差不多两个星期，我无法做到。 So please help me. 所以请帮助我。

Answer 1

Your questions seems to contain multiple subquestions. 您的问题似乎包含多个子问题。 I encourage you to separate them. 我鼓励你把它们分开。

The first thing you apparently need is to combine data frames with different columns. 您显然需要的第一件事是将数据框与不同的列组合在一起。 You can use rbind.fill from the plyr package: 您可以使用rbind.fill从plyr包：

library(plyr)
all_data = do.call(rbind.fill, list_of_data)

Answer 2

Here's an example using some tidyverse functions and a custom function that can combine multiple csv files with missing columns into one data frame: 下面是一个使用一些tidyverse函数和一个自定义函数的示例，该函数可以将多个缺少列的csv文件组合到一个数据框中：

library(tidyverse)

# specify the target directory
dir_path <- '~/test_dir/' 

# specify the naming format of the files. 
# in this case csv files that begin with 'test' and a single digit but it could be as just as simple as 'csv'
re_file <- '^test[0-9]\\.csv'

# create sample data with some missing columns 
df_mtcars <- mtcars %>% rownames_to_column('car_name')
write.csv(df_mtcars %>% select(-am), paste0(dir_path, 'test1.csv'), row.names = FALSE)
write.csv(df_mtcars %>% select(-wt, -gear), paste0(dir_path, 'test2.csv'), row.names = FALSE)
write.csv(df_mtcars %>% select(-cyl), paste0(dir_path, 'test3.csv'), row.names = FALSE)

# custom function that takes the target directory and file name pattern as arguments
read_dir <- function(dir_path, file_name){
  x <- read_csv(paste0(dir_path, file_name)) %>% 
    mutate(file_name = file_name) %>% # add the file name as a column              
    select(file_name, everything())   # reorder the columns so file name is first
  return(x)
}

# read the files from the target directory that match the naming format and combine into one data frame
df_panel <-
  list.files(dir_path, pattern = re_file) %>% 
  map_df(~ read_dir(dir_path, .))

# files with missing columns are filled with NAs.

将一些csv文件合并为一个不同数量的列

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-02-06 16:15:18

解决方案2
0 2018-07-15 12:54:05

将一些csv文件合并为一个不同数量的列

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-02-06 16:15:18

解决方案2 0 2018-07-15 12:54:05

解决方案1
2 已采纳 2014-02-06 16:15:18

解决方案2
0 2018-07-15 12:54:05