简体   繁体   中英

Column counting in R. Just started using it for GWAS and I am lost

Can anyone help me working out how to count the number of instances of a character in a cell per row? I have a file with 10 million snps that I want to sort.

Direction
?????+-+-
?+-+-????
?-+-+??-+

Above is an example of one of many columns that I have. What I want to do is count the number of "?" characters in each row individually and add a new column with that count as a numerical value.

I'm a total beginner thrown in the deep end with this so any help would be appreciated.

Thanks.

Two answers for you

a <- data.frame(direction = c("?????+-+-", "?+-+-????","?-+-+??-+"),  
 stringAsFactors = F)
a$return <- lengths(regmatches(a$direction, gregexpr("\\?", a$direction)))

or as per comments

a$return <- nchar(gsub("[^?]", "", a$direction))

Both return

'data.frame':   3 obs. of  2 variables:
 $ direction: chr  "?????+-+-" "?+-+-????" "?-+-+??-+"
 $ return   : int  5 5 3

There are tons of ways to do this depends on what you're looking for.

Update

While it may not be base R, the packages in the tidyverse are useful for data wrangling and can be used to string together a few calls easily.

install.packages("dplyr")
library(dplyr)
df <- data.frame(Direction = c("???????????-?", "???????????+?", "???????????+?", "???????????-?"), stringsAsFactors = F)
df %>% 
  mutate(qmark = nchar(gsub("[^?]", "", Direction)),
         pos = nchar(gsub("[^+]", "", Direction)),
         neg = nchar(gsub("[^-]", "", Direction)),
         qminus = qmark-(pos+neg),
         total = nchar(Direction))  


      Direction qmark pos neg qminus total
1 ???????????-?    12   0   1     11    13
2 ???????????+?    12   1   0     11    13
3 ???????????+?    12   1   0     11    13
4 ???????????-?    12   0   1     11    13

If your dataset is 10 million lines long however, you might want to use stringi based on some benchmark testing .

install.packages("stringi")
library(stringi)
df %>% 
  mutate(qmark = stri_count(Direction, fixed = "?"),
         pos = stri_count(Direction, fixed = "+"),
         neg = stri_count(Direction, fixed = "-"), 
         qminus = qmark-(pos+neg))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM