Is there a simple way to drop rows containing a non-integer cell value, then/and convert strings to integers, then sort ascending? I have dataset (single column of what's supposed to be just record numbers) that has strings that I want to remove. This code seems to work, but then sorting seems to sort as if "float" is "string." For example, the record numbers are sorted like so:
0
1
2
200000000
201
3
Code:
import pandas
with open('GridExport.csv') as incsv:
df1 = pandas.read_csv(incsv, usecols=['Record Number'])
cln = pandas.DataFrame()
cln['Record Number'] = [x for x in df1['Record Number'] if x.isdigit()]
cln.astype(float)
print(cln.sort(['Record Number']))
Is there a way to do this without converting to float first? I'd like to drop the numbers that don't fit into int64
You may convert all string elements into float elements and conduct the following method for sorting
def numeric_compare(x, y):
return float(x)-float(y)
>>> sorted(['10.0','2000.0','30.0'],cmp=numeric_compare)
['10.0', '30.0', '2000.0']
The problem in your code is that the line
cln['Record Number'].astype(float)
does not modify the data frame. Consequently, it treats the column as of type string and sorts it accordingly. If you print cln['Record Number'].dtype
after the statement, it should make it clear. If you would like to modify it, you should do the assignment
cln['Record Number'] = cln['Record Number'].astype(float)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.