[英]How to select the same columns from many files
I have many text files in which I want to load them all and then make a new matrix with a certain columns from all files 我有很多文本文件,我想在其中全部加载它们,然后用所有文件中的某些列创建一个新的矩阵
for example some matrices are as follows: 例如,一些矩阵如下:
1a <- replicate(10, rnorm(20))
1b <- replicate(10, rnorm(19))
2a <- replicate(10, rnorm(18))
2b <- replicate(10, rnorm(15))
how I reconize them, I put them all in a folder and I set my directory there then I can get the list of them like 我如何调和它们,将它们全部放在一个文件夹中,然后在其中设置目录,然后我可以像下面这样获得它们的列表
filelist = list.files(pattern = ".*.txt")
Then I want to put the first column of the 1a and V6 and V7 in a new matrix then I want to put the V6 and V7 from the 1b in a new matrix then I want to put the V6 and V7 from the 2a in a new matrix then I want to put the V6 and V7 from the 2b in a new matrix 然后我想将1a的第一列和V6和V7放在一个新的矩阵中,然后我想将1b的V6和V7放在一个新的矩阵中,然后我要将2a的V6和V7放在一个新的矩阵中然后我想将2b中的V6和V7放在一个新矩阵中
The files are not in the same length (their rows are different from each other) . 文件的长度不同(它们的行彼此不同)。 I would like to do two things 我想做两件事
1- save the same file with selected columns with an added R to the name for example if the original file is 1a, then select V6 and V7 and same a new file with only 2 columns and name 1aR 1-保存具有选定列的相同文件,并在名称上添加R,例如,如果原始文件为1a,则选择V6和V7,并选择只有2列的新文件,名称为1aR
2- make a new matrix and put all the selected columns in that (those that are not equal , we can make NA or 0 there 2-创建一个新矩阵,然后将所有选定的列放入其中(那些不相等的列,我们可以在那里将NA或0
Here is an option to read the files, select the concerned columns from the dataset, and create a new dataset. 这是读取文件,从数据集中选择相关列并创建新数据集的选项。
We get the files that follow a particular file name pattern in the working directory using list.files
. 我们使用list.files
在工作目录中获取遵循特定文件名模式的文件。
filelist <- list.files(pattern='\\d+[^0-9]+\\.txt', full.names=TRUE)
Then, read all the files into a list
using either read.csv/read.table
or fread
from data.table
然后,阅读所有的文件到list
使用任何read.csv/read.table
或fread
从data.table
lst <- lapply(filelist, read.csv, header=TRUE, stringsAsFactors=FALSE)
Extract the 6th and 7th columns from the 'lst' 从“ lst”中提取第6列和第7列
lst1 <- lapply(lst, "[", c("V6", "V7"))
If the data.frame
elements in the list
have unequal number of rows, one option is cbind.fill
from library(rowr)
如果data.frame
中的元素list
有不等的行数,一个选项是cbind.fill
从library(rowr)
library(rowr)
res <- cbind.fill(lst[[1]][1], do.call(cbind.fill,
c(lst1, list(fill=NA))), fill=NA)
res
# V1 V6 V7 V6.1 V7.1
#1 21 1 11 1 11
#2 22 2 12 2 12
#3 23 3 13 3 13
#4 24 4 14 NA NA
#5 25 5 15 NA NA
#6 26 6 16 NA NA
#7 27 7 17 NA NA
#8 28 8 18 NA NA
#9 29 9 19 NA NA
#10 30 10 20 NA NA
Then, we write the file as .txt
然后,我们将文件写为.txt
write.table(res, 'CombinedV6_V7.txt', row.names=FALSE, quote=FALSE)
Using the data from the link 使用链接中的数据
lst <- lapply(filelist, read.csv, sep='\t',
header=TRUE, stringsAsFactors=FALSE)
lst1 <- lapply(lst, "[", c("Time", "X220"))
res <- do.call(cbind.fill, c(lst1, list(fill=NA)))
head(res)
# Time X220 Time X220 Time X220 Time X220
#1 0.700 111 1.400 2370 0.850 520 1.600 21216
#2 2.083 131747 1.650 179289 1.633 54607 1.900 3816
#3 2.517 23428 2.100 21690 2.117 13677 2.117 3573
#4 2.667 12528 2.267 10383 2.267 13448 2.300 11349
#5 3.883 1055 3.017 816 3.567 1346 9.717 292
#6 4.500 881 3.383 637 5.350 772 21.600 3774
lst <- list(data.frame(V1=21:30, V6=1:10, V7= 11:20),
data.frame(V6=1:3, V7=11:13, V1= 21:23))
NOTE: The above data is just for reproducing the problem. 注意:以上数据仅用于重现该问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.