简体   繁体   中英

python drop non-integer rows, convert to int

Is there a simple way to drop rows containing a non-integer cell value, then/and convert strings to integers, then sort ascending? I have dataset (single column of what's supposed to be just record numbers) that has strings that I want to remove. This code seems to work, but then sorting seems to sort as if "float" is "string." For example, the record numbers are sorted like so:

0
1
2
200000000
201
3

Code:

import pandas

with open('GridExport.csv') as incsv:
    df1 = pandas.read_csv(incsv,  usecols=['Record Number'])
    cln = pandas.DataFrame()
    cln['Record Number'] = [x for x in df1['Record Number'] if x.isdigit()]
    cln.astype(float)
    print(cln.sort(['Record Number']))

Is there a way to do this without converting to float first? I'd like to drop the numbers that don't fit into int64

You may convert all string elements into float elements and conduct the following method for sorting

    def numeric_compare(x, y):
    return float(x)-float(y)

>>> sorted(['10.0','2000.0','30.0'],cmp=numeric_compare)
['10.0', '30.0', '2000.0']

The problem in your code is that the line

cln['Record Number'].astype(float)

does not modify the data frame. Consequently, it treats the column as of type string and sorts it accordingly. If you print cln['Record Number'].dtype after the statement, it should make it clear. If you would like to modify it, you should do the assignment

cln['Record Number'] = cln['Record Number'].astype(float)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM