简体   繁体   中英

Python Index Error list index out of range

I am parsing data Text file which displays list index out of range. Its working in some files while it is not working for some other text files. I need your help in debug this script.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import sys
import os
import re
from collections import OrderedDict
from numpy import unique

def main():
    if len(sys.argv) < 2:
        print("usage: python3 {} <bacmat_out_table> > output".format(sys.argv[0]))
        sys.exit(1)

    bacmat_out = os.path.abspath(sys.argv[1])
    class_sum = OrderedDict()
    with open(bacmat_out) as fh:
        for line in fh:
            if re.search(r"^\s*$|^Query", line):
                continue
            elif len(line) == 0:
                break
            else:
                fields = line.strip().split("\t")
                compounds = fields[6]
                if re.search(r'\[.*\]', compounds):
                    compounds_class = re.findall('\[class:\s?(.+?)\]', compounds)
                    compounds_class = list(unique(compounds_class))
                    if len(compounds_class) > 0:
                        for i in compounds_class:
                            class_sum.setdefault(i, 0)
                            class_sum[i] += 1
                else:
                    compounds = compounds.strip('"')
                    compounds = compounds.strip("'")
                    compounds = compounds.strip()
                    class_sum.setdefault(compounds, 0)
                    class_sum[compounds] += 1
    print("Class\tCount")
    for key in sorted(class_sum.keys()):
        print(key, class_sum[key], sep="\t")
    enter code here

if __name__ == '__main__':
    main()

File for which its working

Query   Subject Gene    Description Organism    Location    Compounds   Percent identity    Match length    E-value Score per length    
BAC0001|abeM|tr|Q5FAM9|Q5FAM9_ACIBA gi|445995506|ref|WP_000073361.1|    abeM    "H-coupled multidrug efflux pump. Confers resistance to Antibiotics such as quinolones and  aminoglycosides and antibacterial biocides such as dyes, QACs. "    Acinetobacter baumannii Chromosome  "4,6-diamidino-2-phenylindole (DAPI) [class: Diamidine], Triclosan [class: Phenolic compounds], Acriflavine [class: Acridine], Hoechst 33342 [class: Bisbenzimide], Rhodamine 6G [class: Xanthene], Ethidium Bromide [class: Phenanthridine], Tetraphenylphosphonium (TPP) [class: Quaternary Ammonium Compounds (QACs)]"   100.0   448 1.3e-243    1.87857142857143    
BAC0002|abeS|tr|Q2FD83|Q2FD83_ACIBA gi|446043276|ref|WP_000121131.1|    abeS    "Disinfectant resistance protein abeS. It can confer resistance to antibiotics such as erythromycin, novomycin, amikacin, ciprofloxacin, norfloxacin, tetracycline, trimethoporin and dyes, QACs etc. " Acinetobacter calcoaceticus/baumannii complex   Chromosome  "Benzylkonium Chloride (BAC) [class: Quaternary Ammonium Compounds (QACs)], Ethidium Bromide [class: Phenanthridine], Acriflavine [class: Acridine], Chlorhexidine [class: Biguanides], Pyronin Y [class: Xanthene], Rhodamine 6G [class: Xanthene], Methyl Viologen [class: Paraquat], Tetraphenylphosphonium (TPP) [class: Quaternary Ammonium Compounds (QACs)], 4,6-diamidino-2-phenylindole (DAPI) [class: Diamindine], Acridine Orange [class: Acridine], Sodium Dodecyl Sulfate (SDS) [class: Organo-sulfate], Sodium Deoxycholate (SDC) [class: Acid], Crystal Violet [class: Triarylmethane], Cetrimide (CTM) [class: Quaternary Ammonium Compounds (QACs)], Cetylpyridinium Chloride (CPC) [class: Quaternary Ammonium Compounds (QACs)], Dequalinium [class: Quaternary Ammonium Compounds (QACs)]"  100.0   109 9.5e-52 1.85504587155963    
BAC0003|acn|tr|O53166|O53166_MYCTU  gi|489995855|ref|WP_003898889.1|    acn "Aconitate hydratase, Acn"  Mycobacterium   Chromosome  Iron (Fe)   100.0   943 0.0e+00 2.03467656415695    
BAC0004|acr3|tr|B5LX01|B5LX01_CAMJU gi|488947840|ref|WP_002858915.1|    acr3    "Arsenical-resistance membrane transporter; part of the an arsenic (ars) four-gene operon, containing genes encoding a putative membrane permease (ArsP), a transcriptional repressor (ArsR), an arsenate reductase (ArsC) and an arsenical-resistance membrane transporter (Acr3)" Campylobacter   Chromosome  Arsenic (As)    100.0   347 4.2e-178    1.7971181556196 
BAC0005|acrA|sp|P0AE06|ACRA_ECOLI   gi|481023858|ref|WP_001295324.1|    acrA    "AcrAB is a drug efflux protein with a broad substrate specificity. It can confer resistant to ampicillin, chloramphenicol as well.  It requires TolC outer memberane protein to function and form the AcrAB-TolC efflux operon. AcrAB-TolC is a drug efflux protein complex with broad substrate specificity that uses the proton motive force to export substrates."  Proteobacteria  Chromosome  "Acriflavine [class: Acridine], Phenol [class: Phenolic compounds], Triclosan [class: Phenolic compounds], p-xylene [class: Aromatic hydrocarbons], Cyclohexane [class: Cycloalkane], Pentane [class: Alkane]"  100.0   397 4.5e-216    1.88916876574307    
BAC0006|acrB|sp|P31224|ACRB_ECOLI   gi|447055213|ref|WP_001132469.1|    acrB    "AcrAB is a drug efflux protein with a broad substrate specificity. It can confer resistant to ampicillin, chloramphenicol as well.It requires TolC outer memberane protein to function and form the AcrAB-TolC efflux operon. AcrAB-TolC is a drug efflux protein complex with broad substrate specificity that uses the proton motive force to export substrates."    Enterobacteriaceae  Chromosome  "Acriflavine [class: Acridine], Phenol [class: Phenolic compounds], Triclosan [class: Phenolic compounds], p-xylene [class: Aromatic hydrocarbons], Cyclohexane [class: Cycloalkane], Pentane [class: Alkane]"  100.0   1049    0.0e+00 1.89733079122974    
BAC0007|acrC|tr|Q1LMP2|Q1LMP2_RALME gi|499835702|ref|WP_011516436.1|    acrC    Cation/multidrug efflux system outer membrane porin arcC.   Cupriavidus metallidurans   Chromosome  Acriflavine [class: Acridine]   100.0   486 2.8e-268    1.90061728395062    
BAC0563|acrD|tr|Q8ZN77|Q8ZN77_SALTY gi|447185822|ref|WP_001263078.1|    acrD    Acriflavine resistance protein D; participates in the efflux of aminoglycosides. It confers resistance to a variety of these substances. It contributes to copper and zinc resistance in Salmonella.    Salmonella enterica Chromosome  "Copper (Cu), Zinc (Zn)"    100.0   1037    0.0e+00 1.90781099324976

File for which its not working

Query   Subject Gene    Description Organism    Location    Compounds   Percent identity    Match length    E-value Score per length    
ERZ1645190.265-NODE-265-length-2544-cov-3.002812_2  gi|1083034424|gb|OGD35356.1|    copB        Copper (Cu) Candidatus Atribacteria bacterium RBG_16_35_8    copper-translocating P-type ATPase, partial 
    80.7    135 2.40e-65    1.56296296296296    
ERZ1645190.6825-NODE-6825-length-778-cov-1.752420_2 gi|1133586191|gb|APW63482.1|    actP        Copper (Cu), Sodium acetate [class: Acetate]    Paludisphaera borealis   Copper-transporting P-type ATPase 
    81.4    161 8.72e-78    1.5527950310559 
ERZ1645190.14825-NODE-14825-length-656-cov-1.279534_1   gi|1084819878|gb|OGQ54449.1|    arrA        Arsenic (As)    Deltaproteobacteria bacterium RIFCSPLOWO2_02_56_12   dehydrogenase 
    90.5    63  1.54e-32    1.98412698412698    
ERZ1645190.15611-NODE-15611-length-649-cov-1.912458_1   gi|1082733223|gb|OGA52347.1|    arrA        Arsenic (As)    Betaproteobacteria bacterium RIFCSPLOWO2_12_FULL_62_13   dehydrogenase 
    85.6    216 2.42e-131   1.81018518518519

Running the script results in error below:

python bacmet_class_summary.py test_bacmet.table > 1.txt

Traceback (most recent call last):
  File "bacmet_class_summary.py", line 52, in <module>
    main()
  File "bacmet_class_summary.py", line 33, in main
    compounds = fields[6]
IndexError: list index out of range

This is the error I am getting while I tried to work with the second example

One of the lines in your file have less than 7 fields when you split by '\t'. Use print(line) before compounds = fields[6] to see which one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM