简体   繁体   中英

python dataframe converts integer to float

With my code, I combine multiple files to a dataframe and convert the NaN values to zero. In the code, I combine two columns (genome and contig) to a new column (source), but my dataframe converts somewhere the column contig from a integer to a float. My inputfile looks like this

AAA 1 345
AAB 2 344

The output is now like:

AAA_1.0 345
AAB_2.0 344

And I want to have it like

AAA_1 345
AAB_2 344

Since my code is very long, I can not place the whole code and all example files on this site, but the part of my code where this probably happend is as follows. I hope that this will be enough for someone to see what the problem is.

#import contig length
df5bb = pd.read_csv('count_contiglength.out', header=None, delim_whitespace=True, names = ["genome", "contig", "contig_length"])
df5bb['source'] = df5bb.genome.astype(str).str.cat(df5bb.contig.astype(str), sep='_')
df5bb = df5bb.set_index('source')
df5b = pd.merge(df5a, df5bb, how='outer')
df5b['source'] = df5b.genome.astype(str).str.cat(df5b.contig.astype(str), sep='_')

nan_cols = df5b.columns[df5b.isnull().any(axis=0)]
for col in nan_cols:
    df5b[col] = df5b[col].fillna(0).astype(int)

#import contigIDnumbers
df5cc = pd.read_csv('contigID.out', header=None, delim_whitespace=True, names = ["genome", "contig", "contigID"])
df5cc['source'] = df5cc.genome.astype(str).str.cat(df5cc.contig.astype(str), sep='_')
df5cc = df5cc.set_index('source')
df5c = pd.merge(df5b, df5cc, how='right')
df5c['source'] = df5c.genome.astype(str).str.cat(df5c.contig.astype(str), sep='_')

I think after merge you get at least one NaN in column genome .

So need again:

df5b['genome'] = df5b['genome'].fillna(0).astype(int)

Check na type promotions - int are converted to float .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM