简体   繁体   English

将字符串转换为 3 个字符的子字符串

[英]Converting a string into 3 character substrings

I have an assignment where we read from a text file of Covid-19 sequences.我有一个任务,我们从 Covid-19 序列的文本文件中读取。 I have read in the first line as a string and now have to use a substring method to break down this line into groups of 3 characters that forms a codon sequence.我已将第一行作为字符串读取,现在必须使用 substring 方法将此行分解为 3 个字符组,其中 forms 是密码子序列。 I am having trouble visualizing how to break this down?我无法想象如何将其分解? This is the first line of the file and every 3rd letter makes a codon.这是文件的第一行,每第三个字母组成一个密码子。 What I have now is testLine = scan.nextLine();我现在拥有的是testLine = scan.nextLine();

AGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAG

for (int i = 0; i < testLine.length(); i += 3)
        
        {
            String codon = testLine.substring(0,3);
            codonList.add(codon);
            
        }
        System.out.println(codonList);

I know I am close, the output from my code above prints the first codon AGA 20 times repeatedly.我知道我很接近,我上面的代码中的 output 重复打印第一个密码子 AGA 20 次。 Here is the output:这是 output:

[AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA]

Edit* I was able to get it with the help of everyone.编辑* 在大家的帮助下,我能够得到它。 The issue I am having now is replicating this for the whole file.我现在遇到的问题是为整个文件复制这个。 I added a hasNext method and it doesn't seem to work the same way.我添加了一个 hasNext 方法,但它的工作方式似乎不同。

    while(scan.hasNext())
    testLine = scan.nextLine();
    for (int i = 0; i < testLine.length(); i += 3)
    {   
        String codon = testLine.substring(i, i + 3);
        codonList.add(codon);
    }
    System.out.println(codonList);  
}
Here is my output with the hasnext added: 
[ATT, AAT, TTT, AGT, AGT, GCT, ATC]

Just use the index in the loop to substring .只需将循环中的索引用于substring

String codon = testLine.substring(i, Math.min(i + 3, testLine.length()));

Demo演示

String#split can also be used.也可以使用String#split

System.out.println(Arrays.toString(testLine.split("(?<=\\G.{3})")));

Explanation of the regex at regex101 : regex101正则表达式的解释:

在此处输入图像描述

It seems you were very close.看来你们已经很亲近了。 You need to use i instead of 0 in the loop.您需要在循环中使用i而不是0

Here is my solution in C#.这是我在 C# 中的解决方案。 I know you ask Java but I had a C# IDE open...我知道你问 Java 但我有一个 C# IDE 打开...

List<string> codonList = new List<string>();
string testLine = "AGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAG";

for (int i = 0; i < testLine.Length; i += 3)    
{
    String codon = testLine.Substring(i, 3);
    codonList.Add(codon);
}

int cnt = 0;
foreach (string s in codonList)
{
    cnt++;
    if (cnt != codonList.Count)
    {
        Console.Write(s + ", ");
    }
    else
    {
        Console.WriteLine(s);
    }                
}
Console.ReadLine();

This will work:这将起作用:

    String testLine = "AGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAG";
    List<String> codonList = new ArrayList<String>();
    String newTestLine = testLine;

    for (int i = 0; i < testLine.length(); i += 3) {
        newTestLine = testLine.substring(i);
        String codon = newTestLine.substring(0, 3);
        codonList.add(codon);
    }
    System.out.println(codonList);

Here's a one liner:这是一个衬里:

String[] parts = testLine.split("(?<=\\G...)");

This works by splitting at points in the input that are 3 characters after the end of the last match (denoted by \G , which is initialized to start of input).这通过在最后一个匹配结束后的 3 个字符处分割输入中的点来工作(由\G表示,它被初始化为输入的开始)。

If you really need a List:如果你真的需要一个列表:

List<String> parts = Arrays.asList(testLine.split("(?<=\\G...)"));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM