简体   繁体   English

Java-检查字符串是否仅包含某些字符(即DNA / RNA)

[英]Java - Check if a string only contains certain characters (i.e. DNA/RNA)

I'm struggling with regex. 我在正则表达式方面苦苦挣扎。

I want to make something like this: 我想做这样的事情:

if (sequence.matches(A|T|G|C)){
String type = "DNA"
}
elseif (sequence.matches(A|U|G|C)){
String type = "RNA"
}

so that the type is only set to DNA if the sequence is only A,T,G or C but RNA if it is A,U,G or C 因此,如果序列仅是A,T,G或C,则类型仅设置为DNA,而如果是A,U,G或C,则将类型设置为RNA

Regardless of the programming language, the regular expression you want should test that the string contains only the characters of interest from start to finish: 无论使用哪种编程语言,所需的正则表达式都应从头到尾测试字符串是否仅包含感兴趣的字符:

^[ACGT]+$

^ means "start of string". ^表示“字符串开始”。 [ACGT] indicates one of those 4 letters. [ACGT]指示这4个字母之一。 + indicates that there must be one or more of those characters. +表示必须有一个或多个这些字符。 $ means "end of string". $表示“字符串结尾”。

So this means that your string must have nothing in it but A, C, G, or T, and there must be at least one of those. 因此,这意味着您的字符串中只能包含A,C,G或T,并且其中至少应包含一个。

Regex may not be your most efficient option: 正则表达式可能不是您最有效的选择:

static boolean consistsOf(String s, String of) {
  for ( int i = 0; i < s.length(); i++ ) {
    if ( of.indexOf(s.charAt(i)) == -1 ) {
      return false;
    }
  }
  return true;
}

You can use the below regex 您可以使用以下正则表达式

if (sequence.matches("[ATGC]+")) { // + for one or more occurrences, * for zero or more occurrences

and the same for the other check as well. 其他检查也一样。

else if (sequence.matches("[AUGC]+")) { // + for one or more occurrences, * for zero or more occurrences

Also, you need to specify the String within doubles quotes if(str.matches("strInDoubleQuotes")) . 另外,您需要在双引号if(str.matches("strInDoubleQuotes"))指定String。

A normal expression would be: "[ATGC]+" which matches with A , T , G or C . 正常表达式为: "[ATGC]+"ATGC匹配。 The expression [ATGC] is known as Character class to which the input string should match. 表达式[ATGC]被称为输入字符串应[ATGC]匹配的字符类 And an expression X+ is part of the Quantifiers which says that the expression X occurs one or more times. 表达式X+量词的一部分,表示表达式X出现一次或多次。

"ATCCGT".matches("[ATGC]+")

Set theory would dictate this simplification: 集合论将要求这种简化:

String type = (sequence.contains("U")) ? "RNA" : "DNA";    
String type = (sequence.contains("T")) ? "DNA" : "RNA";

No? 没有? Frankly not even sure you need 2 expressions. 坦白说,甚至不确定您是否需要2个表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM