I have a .txt file that looks something like this:
rs1 NC AB NC
rs2 AB NC AA
rs3 NC NC NC
...
For each row, I would like to count the frequencies of "NC", so that my output will be something like below:
rs1 2
rs2 1
rs3 3
...
Can someone tell me how to do this in R or in Linux? Many thanks!
df$count <- rowSums(df[-1] == "NC")
# V1 V2 V3 V4 count
# 1 rs1 NC AB NC 2
# 2 rs2 AB NC AA 1
# 3 rs3 NC NC NC 3
We can use rowSums
on the matrix that is created from this expression df[-1] == "NC"
.
dat <- read.table(text="rs1 NC AB NC rs2 AB NC AA rs3 NC NC NC")
dat <- rbind(dat, dat, dat, dat)
You can use a rowwise table
to get the frequencies per row In this case for row 1 to 4 the frequencies that are equal as i copied the data
freq <- apply(dat, 1, table)
1 2 3 4 # row-number
AA 1 1 1 1
AB 2 2 2 2
NC 6 6 6 6
rs1 1 1 1 1
rs2 1 1 1 1
rs3 1 1 1 1
If you want to have aggregated frequencies over all rows use
rowSums(freq)
AA AB NC rs1 rs2 rs3
4 8 24 4 4 4
Using newer version of dplyr (>=1.0), you can use rowwise
and c_across
to sum across columns.
dat <- read.table(text="
SNP G1 G2 G3
rs1 NC AB NC
rs2 AB NC AA
rs3 NC NC NC", header=TRUE)
library(dplyr)
dat %>%
rowwise() %>%
mutate(Total = sum(c_across(G1:G3)=="NC"))
# SNP G1 G2 G3 Total
# <chr> <chr> <chr> <chr> <int>
# 1 rs1 NC AB NC 2
# 2 rs2 AB NC AA 1
# 3 rs3 NC NC NC 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.