简体   繁体   中英

How to count the frequency of a string for each row in R

I have a .txt file that looks something like this:

rs1 NC AB NC     
rs2 AB NC AA  
rs3 NC NC NC  
...  

For each row, I would like to count the frequencies of "NC", so that my output will be something like below:

rs1 2  
rs2 1  
rs3 3  
...

Can someone tell me how to do this in R or in Linux? Many thanks!

df$count <- rowSums(df[-1] == "NC")
#    V1 V2 V3 V4 count
# 1 rs1 NC AB NC     2
# 2 rs2 AB NC AA     1
# 3 rs3 NC NC NC     3

We can use rowSums on the matrix that is created from this expression df[-1] == "NC" .

dat <- read.table(text="rs1 NC AB NC rs2 AB NC AA rs3 NC NC NC")
dat <- rbind(dat, dat, dat, dat)

You can use a rowwise table to get the frequencies per row In this case for row 1 to 4 the frequencies that are equal as i copied the data

freq <- apply(dat, 1, table)
    1 2 3 4 # row-number
AA  1 1 1 1
AB  2 2 2 2
NC  6 6 6 6
rs1 1 1 1 1
rs2 1 1 1 1
rs3 1 1 1 1

If you want to have aggregated frequencies over all rows use

rowSums(freq)
AA  AB  NC rs1 rs2 rs3 
 4   8  24   4   4   4 

Using newer version of dplyr (>=1.0), you can use rowwise and c_across to sum across columns.

dat <- read.table(text="
SNP G1 G2 G3
rs1 NC AB NC
rs2 AB NC AA
rs3 NC NC NC", header=TRUE)

library(dplyr)
dat %>% 
  rowwise() %>% 
  mutate(Total = sum(c_across(G1:G3)=="NC"))
#   SNP   G1    G2    G3    Total
#   <chr> <chr> <chr> <chr> <int>
# 1 rs1   NC    AB    NC        2
# 2 rs2   AB    NC    AA        1
# 3 rs3   NC    NC    NC        3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM