Python：從多個文本文件中提取一列數據

Question

我一直在努力地抓這個頭。 我有幾個文本文件，都具有相同的格式：

   99.00%   2874    2874    U   0   unclassified
  1.00% 29  0   R   1   root
  1.00% 29  0   R1  131567    cellular organisms
  1.00% 29  0   D   2759        Eukaryota
  1.00% 29  0   D1  33154         Opisthokonta
  1.00% 29  0   K   4751            Fungi
  1.00% 29  0   K1  451864            Dikarya

我想從所有這些文件中提取第六列，並將其打印到新文件中。

這是我到目前為止的代碼：

import sys
import os
import glob

# Usage: python extract_species.py path/to/folder > output.txt

def extractSpecies(fileContent, allSpecies):
    for line in fileContent.split('\n'):
        allSpecies.append(line.split('\t')[0])

def file_get_contents(filename):
    with open(filename) as f:
        return f.read()

def listdir_fullpath(d):
    return [os.path.join(d, f) for f in os.listdir(d)]

allFiles = listdir_fullpath(sys.argv[1]) # List all files in the folder provided by system arg.

# Read all files and store content in memory
filesContent = [] # a list is created with one item per file.
for filePath in allFiles:
    filesContent.append(file_get_contents(filePath))

# Extract all species and create a unique list
allSpecies = []
for fileContent in filesContent:
    extractSpecies(fileContent, allSpecies)

print(allSpecies)

但是此代碼僅提供數據第一列的值：

99.00%   1.00%   1.00%   1.00%   1.00%   1.00%   1.00%

如果刪除第7行中的[0]參數（在“ allSpecies.append（line.split（'\\ t'）”之后）），則對象allSpecies將包含文件中的所有數據。

[' 99.00%', '2874', '2874', 'U', '0', 'unclassified'] ['  1.00%', '29', '0', 'R', '1', 'root'] ['  1.00%', '29', '0', 'R1', '131567', '  cellular organisms'] ['  1.00%', '29', '0', 'D', '2759', '    Eukaryota'] ['  1.00%', '29', '0', 'D1', '33154', '      Opisthokonta'] etc

我以為我可以簡單地將[0]更改為我感興趣的列的編號（從1更改為5），但是如果不這樣做，我會收到一條錯誤消息：

IndexError: list index out of range

這真的讓我感到困惑。 確實有一些我真正不了解的東西：如何提取第一列的值，但不能提取其他任何列的值。 歡迎提出任何建議。

Answer 1

我認為您在消除零的正確道路上。 然后，您可以遍歷allSpecies並按索引獲取列。

column6 = []
for x in allSpecies:
    column6.append(allSpecies[x][5])

Python：從多個文本文件中提取一列數據

問題描述

1 個解決方案

解決方案1
0 2018-08-03 16:31:35

Python：從多個文本文件中提取一列數據

問題描述

1 個解決方案

解決方案1 0 2018-08-03 16:31:35

解決方案1
0 2018-08-03 16:31:35