简体   繁体   English

将一些csv文件合并为一个不同数量的列

[英]Combine some csv files into one - different number of columns

I already loaded 20 csv files with function: 我已经加载了20个具有功能的csv文件:

tbl = list.files(pattern="*.csv")
for (i in 1:length(tbl)) assign(tbl[i], read.csv(tbl[i]))

or 要么

list_of_data = lapply(tbl, read.csv)

That how it looks like: 它看起来如何:

> head(tbl)
[1] "F1.csv"          "F10_noS3.csv"    "F11.csv"         "F12.csv"         "F12_noS7_S8.csv"
[6] "F13.csv"

I have to combine all of those files into one. 我必须将所有这些文件合并为一个。 Let's call it a master file but let's try with making a one table with all of the names. 我们将其称为主文件,但让我们尝试制作一个包含所有名称的表。 In all of those csv files is a column called "Accession". 在所有这些csv文件中都有一个名为“Accession”的列。 I would like to make a table of all "names" from all of those csv files. 我想从所有这些csv文件中创建一个包含所有“名称”的表。 Of course many of the accessions can be repeated in different csv files. 当然,许多种质可以在不同的csv文件中重复。 I would like to keep all of the data corresponding to the accession. 我想保留所有与加入相对应的数据。

Some problems: 一些问题:

  • Some of those "names" are the same and I don't want to duplicate them 其中一些“名称”是相同的,我不想复制它们
  • Some of those "names" are ALMOST the same. 其中一些“名称”几乎相同。 The difference is that there is name and after become the dot and the numer. 不同的是,有名称后成为点和数字。
  • The number of columns can be different is those csv files. 列数可以不同是那些csv文件。

That's the screenshot showing how those data looks like: http://imageshack.com/a/img811/7103/29hg.jpg 这是显示这些数据的截图: http//imageshack.com/a/img811/7103/29hg.jpg

Let me show you how it looks: 让我告诉你它的外观:

AT3G26450.1 <--
AT5G44520.2
AT4G24770.1
AT2G37220.2
AT3G02520.1
AT5G05270.1
AT1G32060.1
AT3G52380.1
AT2G43910.2
AT2G19760.1
AT3G26450.2 <--

<-- = Same sample, different names. <-- = 相同的样本,不同的名称。 Should be treated as one. 应该被视为一个。 So just ignore dot and a number after. 所以只需忽略点和数字。

Is it possible to do ? 有可能吗?

I couldn't do a dput(head) because it's even too big data set. 我不能做一个dput(head)因为它甚至是太大的数据集。

I tried to use such code: 我试着使用这样的代码:

all_data = do.call(rbind, list_of_data)
Error in rbind(deparse.level, ...) : 
The number of columns is not correct.


all_data$CleanedAccession = str_extract(all_data$Accession, "^[[:alnum:]]+")
all_data = subset(all_data, !duplicated(CleanedAccession))

I tried to do it for almost 2 weeks and I am not able to. 我试着做了差不多两个星期,我无法做到。 So please help me. 所以请帮助我。

Your questions seems to contain multiple subquestions. 您的问题似乎包含多个子问题。 I encourage you to separate them. 我鼓励你把它们分开。

The first thing you apparently need is to combine data frames with different columns. 您显然需要的第一件事是将数据框与不同的列组合在一起。 You can use rbind.fill from the plyr package: 您可以使用rbind.fillplyr包:

library(plyr)
all_data = do.call(rbind.fill, list_of_data)

Here's an example using some tidyverse functions and a custom function that can combine multiple csv files with missing columns into one data frame: 下面是一个使用一些tidyverse函数和一个自定义函数的示例,该函数可以将多个缺少列的csv文件组合到一个数据框中:

library(tidyverse)

# specify the target directory
dir_path <- '~/test_dir/' 

# specify the naming format of the files. 
# in this case csv files that begin with 'test' and a single digit but it could be as just as simple as 'csv'
re_file <- '^test[0-9]\\.csv'

# create sample data with some missing columns 
df_mtcars <- mtcars %>% rownames_to_column('car_name')
write.csv(df_mtcars %>% select(-am), paste0(dir_path, 'test1.csv'), row.names = FALSE)
write.csv(df_mtcars %>% select(-wt, -gear), paste0(dir_path, 'test2.csv'), row.names = FALSE)
write.csv(df_mtcars %>% select(-cyl), paste0(dir_path, 'test3.csv'), row.names = FALSE)

# custom function that takes the target directory and file name pattern as arguments
read_dir <- function(dir_path, file_name){
  x <- read_csv(paste0(dir_path, file_name)) %>% 
    mutate(file_name = file_name) %>% # add the file name as a column              
    select(file_name, everything())   # reorder the columns so file name is first
  return(x)
}

# read the files from the target directory that match the naming format and combine into one data frame
df_panel <-
  list.files(dir_path, pattern = re_file) %>% 
  map_df(~ read_dir(dir_path, .))

# files with missing columns are filled with NAs. 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将csv文件合并为1个不同列的文件 - Combine csv files into 1 with different columns 如何通过加入列将 2 个 CSV 文件合并为一个 - How to combine 2 CSV files into one by joining columns 合并不同格式的csv文件,并用不同的表格制作成一个excel - Combine csv files of different formats and make into one excel with different sheets 在R中使用melt或dplyr将JSON行与不同数量的列合并,其中一些未标记 - Combine JSON rows with different number of columns, some unlabled, using either melt or dplyr in R 如何将两列合并到由两个或多个不同的csv文件组成的数据帧中的新列中? - How do you combine two columns into a new column in a dataframe made of two or more different csv files? 如何通过使用R将每个文件的数据添加为附加行来将不同的.csv文件组合为一个完整文件? - How to combine different .csv files to one complete file by adding the data of every file as an additional row using R? 如何在 R 中将不规则数量的列合并为一个 - How to combine irregular number of columns into one in R 将不同列中的内容合并为一列 - Combine contents in different columns into one column write和read.csv的列数不同 - write and read.csv different number of columns 如何组合一些列? - how to combine some columns?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM