使用 Python 將字符串拆分為整數列表

Question

這個方法輸入一個文件和文件的目錄。 它包含一個數據矩陣，需要復制給定行名后每行的前 20 列以及該行對應的字母。 每個文件的前 3 行被跳過，因為它有不需要的不重要的信息，也不需要文件底部的數據。

例如一個文件看起來像：

unimportant information--------
 unimportant information--------
 -blank line
1 F -1 2 -3 4 5 6 7 (more columns of ints)
2 L 3 -1 3 4 0 -2 1 (more columns of ints)
3 A 3 -1 3 6 0 -2 5 (more columns of ints)
-blank line
unimportant information--------
unimportant information--------

該方法的輸出需要以某種給定的形式打印出一個“矩陣”。

到目前為止，輸出以字符串形式提供了每一行的列表，但是我試圖找出解決問題的最佳方法。 我不知道如何忽略文件末尾的不重要信息。 我不知道如何只檢索每行字母后的前 20 列，也不知道如何忽略行號和行字母。

def pssmMatrix(self,ipFileName,directory):
    dir = directory
    filename = ipFileName
    my_lst = []

    #takes every file in fasta folder and put in files list
    for f in os.listdir(dir):
        #splits the file name into file name and its extension
        file, file_ext = os.path.splitext(f)

        if file == ipFileName:
            with open(os.path.join(dir,f)) as file_object:

                for _ in range(3):
                    next(file_object)
                for line in file_object:
                        my_lst.append(' '.join(line.strip().split()))
    return my_lst

預期成績：

['-1 2 -3 4 5 6 7'], ['3 -1 3 4 0 -2 1'], ['3 -1 3 6 0 -2 5']

實際結果：

['1 F -1 2 -3 4 5 6 7'], ['2 L 3 -1 3 4 0 -2 1'], ['3 A 3 -1 3 6 0 -2 5'],  [' '], [' unimportant info'], ['unimportant info']

Answer 1

試試這個解決方案

    import re
    reg = re.compile(r'(?<=[0-9]\s[A-Z]\s)[0-9\-\s]+')

    text = """
    unimportant information--------

    unimportant information--------
    -blank line

    1 F -1 2 -3 4 5 6 7 (more columns of ints)

    2 L 3 -1 3 4 0 -2 1 (more columns of ints)

    3 A 3 -1 3 6 0 -2 5 (more columns of ints)"""

    ignore_start = 5  # 0,1,2,3 =  4
    expected_array = []
    for index, line in enumerate(text.splitlines()):
    if(index >= ignore_start):
            if reg.search(line):
            result = reg.search(line).group(0).strip()
            # Use Result
            expected_array.append(' '.join(result))

    print(expected_array)
    # Result: [
    #'- 1   2   - 3   4   5   6   7', 
    #'3   - 1   3   4   0   - 2   1', 
    #'3   - 1   3   6   0   - 2   5'
    #]

Answer 2

要刪除前兩列，您可以更改：

my_lst.append(' '.join(line.strip().split()))

至

my_lst.append(' '.join(line.strip().split()[2:]))

在它們被拆分之后以及它們重新組合在一起之前，它將丟棄前兩列。

要刪除最后3個不相關的行，也許最簡單的解決方案就是改變：

return my_lst

至

return my_lst[:-3]

這將返回除最后3行之外的所有內容。

Answer 3

好的，所以它看起來像你有一個文件，你想要的某些行，你想要的行總是以一個數字后跟一個字母開頭。 所以我們可以做的是對它應用一個正則表達式，它只獲得與該模式匹配的行，並且僅獲取模式后的數字

這個表達式看起來像(?<=[0-9]\\s[AZ]\\s)[0-9\\-\\s]+

import re

reg = re.compile(r'(?<=[0-9]\s[A-Z]\s)[0-9\-\s]+')

for line in file:
    if reg.search(line):
        result = reg.search(test).group(0)
        # Use Result
        my_lst.append(' '.join(result))

希望有所幫助

使用 Python 將字符串拆分為整數列表

問題描述

2 個解決方案

解決方案1
0 已采納 2019-03-23 22:56:15

解決方案2
0 2019-03-23 22:57:02

解決方案3
0 2019-03-23 22:59:29

使用 Python 將字符串拆分為整數列表

問題描述

2 個解決方案

解決方案1 0 已采納 2019-03-23 22:56:15

解決方案2 0 2019-03-23 22:57:02

解決方案3 0 2019-03-23 22:59:29

解決方案1
0 已采納 2019-03-23 22:56:15

解決方案2
0 2019-03-23 22:57:02

解決方案3
0 2019-03-23 22:59:29