R中的列计数。刚开始将其用于GWAS，我迷路了

Question

Can anyone help me working out how to count the number of instances of a character in a cell per row? 谁能帮我解决如何计算每行单元格中字符实例的数量吗？ I have a file with 10 million snps that I want to sort. 我有一个要排序的1000万个snps文件。

Direction
?????+-+-
?+-+-????
?-+-+??-+

Above is an example of one of many columns that I have. 上面是我拥有的许多专栏之一的示例。 What I want to do is count the number of "?" 我要计算的是“？”的数量 characters in each row individually and add a new column with that count as a numerical value. 每行中的每个字符，然后添加一个新列，并将该计数作为数值。

I'm a total beginner thrown in the deep end with this so any help would be appreciated. 我完全是个初学者，对此深有体会，因此我们将不胜感激。

Thanks. 谢谢。

Answer 1

Two answers for you 给你两个答案

a <- data.frame(direction = c("?????+-+-", "?+-+-????","?-+-+??-+"),  
 stringAsFactors = F)
a$return <- lengths(regmatches(a$direction, gregexpr("\\?", a$direction)))

or as per comments 或根据评论

a$return <- nchar(gsub("[^?]", "", a$direction))

Both return 都回来了

'data.frame':   3 obs. of  2 variables:
 $ direction: chr  "?????+-+-" "?+-+-????" "?-+-+??-+"
 $ return   : int  5 5 3

There are tons of ways to do this depends on what you're looking for. 有很多方法可以做到这一点，取决于您要寻找的东西。

Update 更新资料

While it may not be base R, the packages in the tidyverse are useful for data wrangling and can be used to string together a few calls easily. tidyverse中的程序包可能不是以R为基数的，但它们对于数据整理很有用，可用于轻松地将几个调用串在一起。

install.packages("dplyr")
library(dplyr)
df <- data.frame(Direction = c("???????????-?", "???????????+?", "???????????+?", "???????????-?"), stringsAsFactors = F)
df %>% 
  mutate(qmark = nchar(gsub("[^?]", "", Direction)),
         pos = nchar(gsub("[^+]", "", Direction)),
         neg = nchar(gsub("[^-]", "", Direction)),
         qminus = qmark-(pos+neg),
         total = nchar(Direction))  


      Direction qmark pos neg qminus total
1 ???????????-?    12   0   1     11    13
2 ???????????+?    12   1   0     11    13
3 ???????????+?    12   1   0     11    13
4 ???????????-?    12   0   1     11    13

If your dataset is 10 million lines long however, you might want to use stringi based on some benchmark testing . 但是，如果数据集的长度为1000万行，则可能需要根据一些基准测试使用stringi 。

install.packages("stringi")
library(stringi)
df %>% 
  mutate(qmark = stri_count(Direction, fixed = "?"),
         pos = stri_count(Direction, fixed = "+"),
         neg = stri_count(Direction, fixed = "-"), 
         qminus = qmark-(pos+neg))

R中的列计数。刚开始将其用于GWAS，我迷路了

问题描述

1 个解决方案

解决方案1
1 2017-07-21 18:03:04

Update 更新资料

R中的列计数。刚开始将其用于GWAS，我迷路了

问题描述

1 个解决方案

解决方案1 1 2017-07-21 18:03:04

Update 更新资料

解决方案1
1 2017-07-21 18:03:04