I am trying to write a script to parse the Ncbi BLAST report. The column that is causing this error is the genome GI number.
Eg LT697097.1
There is a decimal at the end. When i try to split this and just get the GI number, I get this error.
Django AttributeError 'float' object has no attribute 'split' tells me that this error is because split assumes that it is a float value.
So, I used the advice from Pandas reading csv as string type to import the pandas column as string.
I am using column number as the report doesn't automatically have column names.
import pandas as pd
df = pd.read_csv("out.txt", sep="\t", dtype=object, names = ['query id','subject ids','query acc.ver','subject acc.ver','% identity','alignment length', 'mismatches','gap opens','q.start','q.end','s.start','s.end','evalue','bit score'])
sacc = df['subject acc.ver']
sacc = [i.split('.',1)[0] for i in sacc]
I still get the error AttributeError: 'float' object has no attribute 'split'.
I then tried astype(str) as suggested by Convert Columns to String in Pandas .
This fails to read the column, and only has the columns names attribute as the output value.
Can you please advice me where I'm going wrong in my approach?
I think you need str.split
with selecting first list which working with NaN
s very nice. Another problem should be some values without .
:
df['subject acc.ver'] = df['subject acc.ver'].str.split('.',1).str[0]
Sample:
df = pd.DataFrame({'subject acc.ver':['LT697097.1',np.nan,None, 'LT6']})
df['subject acc.ver'] = df['subject acc.ver'].str.split('.',1).str[0]
print (df)
subject acc.ver
0 LT697097
1 NaN
2 None
3 LT6
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.