简体   繁体   中英

Blast parsing: AttributeError: 'float' object has no attribute 'split'

I am trying to write a script to parse the Ncbi BLAST report. The column that is causing this error is the genome GI number.

Eg LT697097.1

There is a decimal at the end. When i try to split this and just get the GI number, I get this error.

Django AttributeError 'float' object has no attribute 'split' tells me that this error is because split assumes that it is a float value.

So, I used the advice from Pandas reading csv as string type to import the pandas column as string.

I am using column number as the report doesn't automatically have column names.

import pandas as pd    
df = pd.read_csv("out.txt", sep="\t", dtype=object, names = ['query id','subject ids','query acc.ver','subject acc.ver','% identity','alignment length', 'mismatches','gap opens','q.start','q.end','s.start','s.end','evalue','bit score'])

sacc = df['subject acc.ver']
sacc = [i.split('.',1)[0] for i in sacc]

I still get the error AttributeError: 'float' object has no attribute 'split'.

I then tried astype(str) as suggested by Convert Columns to String in Pandas .

This fails to read the column, and only has the columns names attribute as the output value.

Can you please advice me where I'm going wrong in my approach?

I think you need str.split with selecting first list which working with NaN s very nice. Another problem should be some values without . :

df['subject acc.ver'] = df['subject acc.ver'].str.split('.',1).str[0]

Sample:

df = pd.DataFrame({'subject acc.ver':['LT697097.1',np.nan,None, 'LT6']})

df['subject acc.ver'] = df['subject acc.ver'].str.split('.',1).str[0]
print (df)
  subject acc.ver
0        LT697097
1             NaN
2            None
3             LT6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM