i have 6 txt files split in 2 groups (A and T files). i want import all these files in R and intersect every A file with every T file and obtain a matrix with the ratio of A over T like in this example. I was thinking of making two lists of vectors and find a way to calculate this matrix starting from them.
A_1.txt
tomato
zucchini
potato
banana
coconut
salt
A_2.txt
tomato
zucchini
potato
A_3.txt
zucchini
potato
T_1.txt
tomato
zucchini
potato
banana
coconut
salt
T_2.txt
tomato
zucchini
potato
banana
T_3.txt
potato
banana
coconut
what i want to obtains is this matrix:
T_1 T_2 T_3
A_1 6 4 3
A_2 3 3 1
A_3 2 2 1
Could somebody can give me a tip on how to do this in R?
I read in this information in this way:
A_files <- list.files("/home/A/", full.names = TRUE)
T_files <- list.files("/home/T/", full.names = TRUE)
myAlist <- lapply(A_files, read.delim, header=FALSE)
myTlist <- lapply(T_files, read.delim, header=FALSE)
This is what I would do with my preferred set of tools:
library(data.table)
library(magrittr)
filenames <- dir(pattern = "^[AT]_\\d.txt$")
vec <-
lapply(filenames, fread, header = FALSE) %>%
set_names(filenames %>% stringr::str_remove("\\.txt$")) %>%
rbindlist(idcol = "file")
vecA <- vec[file %like% "^A"]
vecT <- vec[file %like% "^T"]
vecA[vecT, on = .(V1), allow.cartesian = TRUE] %>%
dcast(file ~ i.file, length)
file T_1 T_2 T_3 1: A_1 6 4 3 2: A_2 3 3 1 3: A_3 2 2 1
A_1.txt
, A_2.txt
, ..., T_2.txt
, T_3.txt
are stored in the same folder, the filenames are picked.vecA
and vecT
. (This is just for clarity and to make the code less convoluted). The result of the join is
vecA[vecT, on = .(V1), allow.cartesian = TRUE]
file V1 i.file 1: A_1 tomato T_1 2: A_2 tomato T_1 3: A_1 zucchini T_1 4: A_2 zucchini T_1 5: A_3 zucchini T_1 6: A_1 potato T_1 7: A_2 potato T_1 8: A_3 potato T_1 9: A_1 banana T_1 10: A_1 coconut T_1 11: A_1 salt T_1 12: A_1 tomato T_2 13: A_2 tomato T_2 14: A_1 zucchini T_2 15: A_2 zucchini T_2 16: A_3 zucchini T_2 17: A_1 potato T_2 18: A_2 potato T_2 19: A_3 potato T_2 20: A_1 banana T_2 21: A_1 potato T_3 22: A_2 potato T_3 23: A_3 potato T_3 24: A_1 banana T_3 25: A_1 coconut T_3 file V1 i.file
This is a way to create the 6 input files from the sample dataset provided in the question:
library(data.table)
library(magrittr)
fread("A_1.txt
tomato
zucchini
potato
banana
coconut
salt
A_2.txt
tomato
zucchini
potato
A_3.txt
zucchini
potato
T_1.txt
tomato
zucchini
potato
banana
coconut
salt
T_2.txt
tomato
zucchini
potato
banana
T_3.txt
potato
banana
coconut", header = FALSE) %>%
.[, fwrite(.(V1[-1]), V1[1]), by = cumsum(V1 %like% "^[AT]_\\d.txt$")]
Here is an approach using base R commands. R defaults to creating factors from character vectors. It is important that you not allow that. Including the argument as.is=TRUE
in your read.csv
commands will preserve the character data. First make the data easily available:
myAlist <- list(A_1 = c("tomato", "zucchini", "potato", "banana", "coconut",
"salt"), A_2 = c("tomato", "zucchini", "potato"), A_3 = c("zucchini",
"potato"))
myTlist <- list(T_1 = c("tomato", "zucchini", "potato", "banana", "coconut",
"salt"), T_2 = c("tomato", "zucchini", "potato", "banana"), T_3 = c("potato",
"banana", "coconut"))
Now we create a function to find the intersection of two groups and compute the number of shared items:
Shared <- function(a, t) {
length(intersect(myAlist[[a]], myTlist[[t]]))
}
We are taking each group in A and comparing it to each group in B, eg A1 with B1, B2, B3, etc:
(A <- rep(1:3, each=3))
# [1] 1 1 1 2 2 2 3 3 3
(T <- rep(1:3, 3))
# [1] 1 2 3 1 2 3 1 2 3
Finally we compute the number of shared items:
nshare <- mapply(Shared, A, T)
myTbl <- matrix(nshare, 3, byrow=TRUE, dimnames=list(A=names(myAlist), T=names(myTlist)))
myTbl
# T
# A T_1 T_2 T_3
# A_1 6 4 3
# A_2 3 3 1
# A_3 2 2 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.