繁体   English   中英

Java检查列表中的元素是否全部出现

[英]Java checking if an element from a list appears in all occurrences

我有一个方法,它接收字符串的ArrayList,列表中的每个元素等于以下形式的变体:

>AX018718 Equine influenza virus H3N8 // 4 (HA)
CAAAAGCAGGGTGACAAAAACATGATGGATTCCAACACTGTGTCAAGCTTTCAGGTAGACTGTTTTCTTT
GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA

此方法细分为Acc,在这种情况下为AX018718,seq是Acc之后的两行

然后由另一个名为pal的字符串ArrayList进行检查,以查看子字符串是否匹配[AAAATTTT,AAACGTTT,AAATATATTT]

我能够将第一个列表的不同元素的所有匹配项输出为:

AATATATT in organism: AX225014 Was found in position: 15 and at 15
AATATT in organism: AX225014 Was found in position: 1432 and at 1432
AATATT in organism: AX225016 Was found in position: 1404 and at 1404
AATT in organism: AX225016 Was found in position: 169 and at 2205

是否可以检查所有输出的信息是否全部Acc匹配一个朋友?

在上述情况下,所需的输出为:

AATATT was found in all of the Acc.

我的工作代码:

public static ArrayList<String> PB2Scan(ArrayList<String> Pal) throws FileNotFoundException, IOException
{
    ArrayList<String> PalindromesSpotted  = new ArrayList<String>();

    File file = new File("IAV_PB2_32640.txt");
    Scanner sc = new Scanner(file);
    sc.useDelimiter(">");
    //initializes the ArrayList
    ArrayList<String> Gene1 = new ArrayList<String>();
    //initializes the writer
    FileWriter fileWriter = new FileWriter("PB2out");
    PrintWriter printwriter = new PrintWriter(fileWriter);
    //Loads the Array List
    while(sc.hasNext()) Gene1.add(sc.next());
    for(int i = 0; i < Gene1.size(); i++) 
    {
    //Acc breaks down the title so the element:
        //>AX225014 Equine influenza virus H3N8 // 1 (PB2)
        //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
        //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
        //comes out as AX225014
    String Acc = Accession(Gene1.get(i));
    //seq takes the same element as above and returns only
    //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
    //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
    String seq = trimHeader(Gene1.get(i));
        for(int x = 0; x<Pal.size(); x++) 
        {
        if(seq.contains(Pal.get(x))){
        String match = (Pal.get(x) + " in organism: " + Acc + " Was found in position: "+ seq.indexOf(Pal.get(x)) + " and at " +seq.lastIndexOf(Pal.get(x)));
        printwriter.println(match);
        PalindromesSpotted.add(match);
        }
        }
    }
    Collections.sort(PalindromesSpotted);
return PalindromesSpotted;
}

您可能应该创建一个包含Pals作为键的Map<String, List<String>>以及包含它们的Accs作为值。

Map<String, List<String>> result = new HashMap<>();
for (String gene : Gene1) {
    List<String> list = new ArrayList<>();
    result.put(gene, list);
    for (String pal : Pal) {
        if (acc.contains(trimHeader(gene))) {
            list.add(pal);
        }
    }
}

现在,您有了一个地图,可以查询每个基因包含的好朋友:

List<String> containedPals = result.get(gene);

对于这样的函数,这是非常合理的结果。 之后最好执行的操作(即写入文件)应在另一个函数(称为该函数)中完成。

因此,这可能是您想要做的:

List<String> genes = loadGenes(geneFile);
List<String> pals = loadPal(palFile);
Map<String, List<String>> genesToContainedPal = methodAbove(genes, pals);
switch (resultTyp) {
    // ...
}

首先,由于您没有关闭编写器或至少刷新了 PrintWriter ,因此您的代码不会写入任何文件来记录结果。 事实上,您也不会关闭读者。 您确实应该关闭读者和作家以释放资源。 值得深思。

您可以使PB2Scan()方法返回一个简单的结果列表(如现在),或者返回包含相同Pal的仅acc的结果列表,或者返回记录了简单结果列表和末尾的两个结果。该列表列出了包含相同Pal(也将被记录)的acc列表。

某些额外的代码和PB2Scan()方法的额外整数参数将执行此操作。 对于其他参数,您可能需要添加以下内容:

public static ArrayList<String> PB2Scan(ArrayList<String> Pal, int resultType) 
                                throws FileNotFoundException, IOException
{ .... }

整数resultType参数将采用从0到2的三个整数值之一:

  • 0-简单的结果列表,如当前代码所示;
  • 1-与Pal匹配的Acc;
  • 2-简单结果列表和与Pal匹配的Acc在结果列表的末尾。

您还应该真正将文件作为PB2Scan()方法的参数来读取,因为此文件在下次使用时很容易成为另一个名称。 这使该方法更具通用性,而不是如果文件名是硬编码的。

public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType) 
                                throws FileNotFoundException, IOException { .... }

该方法始终可以写入Same输出文件,因为它最适合它来自哪种方法。

使用上面的概念,而不是在创建PalindromesSpotted ArrayList时写入输出文件( PB2Out.txt ),我认为最好在ArrayList或ArrayLists完成后写入文件。 为此,另一种方法( writeListToFile() )最适合执行任务。 要找出是否有任何同一个Pal与其他Acc匹配,再次建议使用另一种方法( getPalMatches() )来完成该任务。

由于在任何给定的Seq多个给定的Pal的索引位置未正确报告,因此我提供了另一种方法( findSubstringIndexes() )快速处理该任务。

应该注意的是,下面的代码假定从trimHeader()方法获取的Seq都是一个字符串,其中没有换行符

下面列出了经过重做的PB2Scan()方法和其他上述方法:

PB2Scan()方法:

public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType) 
                                throws FileNotFoundException, IOException {
    // Make sure the supplied result type is either 
    // 0, 1, or 2. If not then default to 0.
    if (resultType < 0 || resultType > 2) {
        resultType = 0;
    }
    ArrayList<String> PalindromesSpotted = new ArrayList<>();

    File file = new File(filePath);
    Scanner sc = new Scanner(file);
    sc.useDelimiter(">");
    //initializes the ArrayList
    ArrayList<String> Gene1 = new ArrayList<>();
    //Loads the Array List
    while (sc.hasNext()) {
        Gene1.add(sc.next());
    }
    sc.close(); // Close the read in text file.

    for (int i = 0; i < Gene1.size(); i++) {
        //Acc breaks down the title so the element:
        //>AX225014 Equine influenza virus H3N8 // 1 (PB2)
        //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
        //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
        //comes out as AX225014
        String Acc = Accession(Gene1.get(i));

        //seq takes the same element as above and returns only
        //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
        //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
        String seq = trimHeader(Gene1.get(i));
        for (int x = 0; x < Pal.size(); x++) {
            if (seq.contains(Pal.get(x))) {
                String match = Pal.get(x) + " in organism: " + Acc + 
                                " Was found in position(s): " + 
                                findSubstringIndexes(seq, Pal.get(x));
                PalindromesSpotted.add(match);
            }
        }
    }

    // If there is nothing to work with get outta here.
    if (PalindromesSpotted.isEmpty()) {
        return PalindromesSpotted;
    }

    // Sort the ArrayList
    Collections.sort(PalindromesSpotted);
    // Another ArrayList for matching Pal's to Acc's
    ArrayList<String> accMatchingPal = new ArrayList<>();
    switch (resultType) {
        case 0: // if resultType is 0 is supplied
            writeListToFile("PB2Out.txt", PalindromesSpotted);
            return PalindromesSpotted;

        case 1: // if resultType is 1 is supplied
            accMatchingPal = getPalMatches(PalindromesSpotted);
            writeListToFile("PB2Out.txt", accMatchingPal);
            return accMatchingPal;

        default: // if resultType is 2 is supplied
            accMatchingPal = getPalMatches(PalindromesSpotted);
            ArrayList<String> fullList = new ArrayList<>();
            fullList.addAll(PalindromesSpotted);
            // Create a Underline made of = signs in the list.
            fullList.add(String.join("", Collections.nCopies(70, "=")));
            fullList.addAll(accMatchingPal);
            writeListToFile("PB2Out.txt", fullList);
            return fullList;
    }
}   

findSubstringIndexes()方法:

private static String findSubstringIndexes(String inputString, String stringToFind){
    String indexes = "";
    int index = inputString.indexOf(stringToFind);
    while (index >= 0){
        indexes+= (indexes.equals("")) ? String.valueOf(index) : ", " + String.valueOf(index);
        index = inputString.indexOf(stringToFind, index + stringToFind.length())   ;
    }
    return indexes;
}

getPalMatches()方法:

private static ArrayList<String> getPalMatches(ArrayList<String> Palindromes) {
    ArrayList<String> accMatching = new ArrayList<>();
    for (int i = 0; i < Palindromes.size(); i++) {
        String matches = "";
        String[] split1 = Palindromes.get(i).split("\\s+");
        String pal1 = split1[0];
        // Make sure the current Pal hasn't already been listed.
        boolean alreadyListed = false;
        for (int there = 0; there < accMatching.size(); there++) {
            String[] th = accMatching.get(there).split("\\s+");
            if (th[0].equals(pal1)) {
                alreadyListed = true;
                break;
            }
        }
        if (alreadyListed) { continue; }
        for (int j = 0; j < Palindromes.size(); j++) {
            String[] split2 = Palindromes.get(j).split("\\s+");
            String pal2 = split2[0];
            if (pal1.equals(pal2)) {
                // Using Ternary Operator to build the matches string
                matches+= (matches.equals("")) ? pal1 + " was found in the following Accessions: "
                        + split2[3] : ", " + split2[3];
            }
        }
        if (!matches.equals("")) {
            accMatching.add(matches);
        }
    }
    return accMatching;
}

writeListToFile()方法:

private static void writeListToFile(String filePath, ArrayList<String> list, boolean... appendToFile) {
    boolean appendFile = false;
    if (appendToFile.length > 0) { appendFile = appendToFile[0]; }

    try {
        try (BufferedWriter bw = new BufferedWriter(new FileWriter(filePath, appendFile))) {
            for (int i = 0; i < list.size(); i++) {
                bw.append(list.get(i) + System.lineSeparator());
            }
        }
    } catch (IOException ex) {
        ex.printStackTrace();
    }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM