简体   繁体   English

如何从许多文件中选择相同的列

[英]How to select the same columns from many files

I have many text files in which I want to load them all and then make a new matrix with a certain columns from all files 我有很多文本文件,我想在其中全部加载它们,然后用所有文件中的某些列创建一个新的矩阵

for example some matrices are as follows: 例如,一些矩阵如下:

1a <- replicate(10, rnorm(20)) 
1b <- replicate(10, rnorm(19)) 
2a <- replicate(10, rnorm(18))
2b <- replicate(10, rnorm(15))

how I reconize them, I put them all in a folder and I set my directory there then I can get the list of them like 我如何调和它们,将它们全部放在一个文件夹中,然后在其中设置目录,然后我可以像下面这样获得它们的列表

filelist = list.files(pattern = ".*.txt")

Then I want to put the first column of the 1a and V6 and V7 in a new matrix then I want to put the V6 and V7 from the 1b in a new matrix then I want to put the V6 and V7 from the 2a in a new matrix then I want to put the V6 and V7 from the 2b in a new matrix 然后我想将1a的第一列和V6和V7放在一个新的矩阵中,然后我想将1b的V6和V7放在一个新的矩阵中,然后我要将2a的V6和V7放在一个新的矩阵中然后我想将2b中的V6和V7放在一个新矩阵中

The files are not in the same length (their rows are different from each other) . 文件的长度不同(它们的行彼此不同)。 I would like to do two things 我想做两件事

1- save the same file with selected columns with an added R to the name for example if the original file is 1a, then select V6 and V7 and same a new file with only 2 columns and name 1aR 1-保存具有选定列的相同文件,并在名称上添加R,例如,如果原始文件为1a,则选择V6和V7,并选择只有2列的新文件,名称为1aR

2- make a new matrix and put all the selected columns in that (those that are not equal , we can make NA or 0 there 2-创建一个新矩阵,然后将所有选定的列放入其中(那些不相等的列,我们可以在那里将NA或0

Here is an option to read the files, select the concerned columns from the dataset, and create a new dataset. 这是读取文件,从数据集中选择相关列并创建新数据集的选项。

We get the files that follow a particular file name pattern in the working directory using list.files . 我们使用list.files在工作目录中获取遵循特定文件名模式的文件。

filelist <- list.files(pattern='\\d+[^0-9]+\\.txt', full.names=TRUE)

Then, read all the files into a list using either read.csv/read.table or fread from data.table 然后,阅读所有的文件到list使用任何read.csv/read.tablefreaddata.table

lst <- lapply(filelist, read.csv, header=TRUE, stringsAsFactors=FALSE)

Extract the 6th and 7th columns from the 'lst' 从“ lst”中提取第6列和第7列

lst1 <- lapply(lst, "[", c("V6", "V7"))

If the data.frame elements in the list have unequal number of rows, one option is cbind.fill from library(rowr) 如果data.frame中的元素list有不等的行数,一个选项是cbind.filllibrary(rowr)

library(rowr)
res <- cbind.fill(lst[[1]][1], do.call(cbind.fill, 
           c(lst1, list(fill=NA))), fill=NA)
res 
#   V1 V6 V7 V6.1 V7.1
#1  21  1 11    1   11
#2  22  2 12    2   12
#3  23  3 13    3   13
#4  24  4 14   NA   NA
#5  25  5 15   NA   NA
#6  26  6 16   NA   NA
#7  27  7 17   NA   NA
#8  28  8 18   NA   NA
#9  29  9 19   NA   NA
#10 30 10 20   NA   NA

Then, we write the file as .txt 然后,我们将文件写为.txt

write.table(res, 'CombinedV6_V7.txt', row.names=FALSE, quote=FALSE)

Update 更新资料

Using the data from the link 使用链接中的数据

lst <- lapply(filelist, read.csv, sep='\t',
              header=TRUE, stringsAsFactors=FALSE)
lst1 <- lapply(lst, "[", c("Time", "X220"))
res <- do.call(cbind.fill, c(lst1, list(fill=NA)))
head(res)
#   Time   X220  Time   X220  Time  X220   Time  X220
#1 0.700    111 1.400   2370 0.850   520  1.600 21216
#2 2.083 131747 1.650 179289 1.633 54607  1.900  3816
#3 2.517  23428 2.100  21690 2.117 13677  2.117  3573
#4 2.667  12528 2.267  10383 2.267 13448  2.300 11349
#5 3.883   1055 3.017    816 3.567  1346  9.717   292
#6 4.500    881 3.383    637 5.350   772 21.600  3774

data 数据

 lst <- list(data.frame(V1=21:30, V6=1:10, V7= 11:20), 
             data.frame(V6=1:3, V7=11:13, V1= 21:23))

NOTE: The above data is just for reproducing the problem. 注意:以上数据仅用于重现该问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM