Python: Using str.split and getting list index out of range

Question

I just started using python and am trying to convert some of my R code into python. The task is relatively simple; I have many csv file with a variable name (in this case cell lines) and values ( IC50's). I need to pull out all variables and their values shared in common among all files. Some of these files share the save variables but are formatted differently. For example in some files a variable is just "Cell_line" and in others it is MEL:Cell_line. So first things first to make a direct string comparison I need to format them the same and hence am trying ti use str.split() to do so. There is probably a much better way to do this but for now I am using the following code:

import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv" 
with open(file_name) as csvfile:
    NCI_data=csv.reader(csvfile, delimiter=',')
    alldata={}
    for row in NCI_data:
        name_str=row[0]
        splt=name_str.split(':')
        n_name=splt[1]
        alldata[n_name]=row

[1] name_str.split return a list of length 2. Since the portion I want is after the ":" I want the second element which should be indexed as splt[1] as splt[0] is the first in python. However when I run the code I get this error message "IndexError: list index out of range" I'm trying the second element out of a list of length 2 thus I have no idea why it is out of range. Any help or suggestions would be appreciated.

Answer 1

I am pretty sure that there are some rows where name_str does not have a : in them. From your own example if the name_str is Cell_line it would fail.

If you are sure that there would only be 1 : in name_str (at max) , or if there are multiple : you want to select the last one, instead of splt[1] , you should use - splt[-1] . -1 index would take the last element in the list (unless its empty) .

Answer 2

The simple answer is that sometimes the data isn't following the specification being assumed when you write this code (ie that there is a colon and two fields).

The easiest way to deal with this is to add an if block if len(splot)==2: and do the subsequent lines within that block.

Optionally, add an else: and print the lines that are not so spec or save them somewhere so you can diagnose.

Like this:

import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv" 
with open(file_name) as csvfile:
    NCI_data=csv.reader(csvfile, delimiter=',')
    alldata={}
    for row in NCI_data:
        name_str=row[0]
        splt=name_str.split(':')
        if len(splt)==2: 
             n_name=splt[1]
             alldata[n_name]=row
        else:
             print "invalid name: "+name_str

Alternatively, you can use try/except , which in this case is a bit more robust because we can handle IndexError anywhere, in either row[0] or in split[1] , with the one exception handler, and we don't have to specify that the length of the : split field should be 2.

In addition we could explicitly check that there actually is a : before splitting, and assign the name appropriately.

import csv
import os
# Change working directory
os.chdir("/Users/joshuamannheimer/downloads")
file_name="NCI60_Bleomycin.csv" 
with open(file_name) as csvfile:
    NCI_data=csv.reader(csvfile, delimiter=',')
    alldata={}
    for row in NCI_data:
        try:
            name_str=row[0]
            if ':' in name_str:
                splt=name_str.split(':')
                n_name=splt[1]
            else:
                n_name = name_str
            alldata[n_name]=row
        except IndexError: 
            print "bad row:"+str(row)

Python: Using str.split and getting list index out of range

Question

2 answers

solution1
3 2015-08-07 01:21:25

solution2
2 2015-08-07 01:20:41

Python: Using str.split and getting list index out of range

Question

2 answers

solution1 3 2015-08-07 01:21:25

solution2 2 2015-08-07 01:20:41

solution1
3 2015-08-07 01:21:25

solution2
2 2015-08-07 01:20:41