[英]How to convert the column to numeric in python for sorting
我是python(学习者)的新手。 请检查我的问题,并帮助我解决问题。
我有以下内容的csv文件
test,cycle,date,status
func,2,09/07/17,pass
func,10,09/08/17,fail
func,3,09/08/17,pass
func,1,09/08/17,no run
func,22,09/08/17,in progress
func,11,09/08/17,on hold
当我对第二列(循环)进行排序时,它显示以下输出
['func', '1', '09/08/17', 'no run']
['func', '10', '09/08/17', 'fail']
['func', '11', '09/08/17', 'on hold']
['func', '2', '09/07/17', 'pass']
['func', '22', '09/08/17', 'in progress']
['func', '3', '09/08/17', 'pass']
我在这里遇到的问题是它按字符串排序,因此它显示输出为1、10、11、2、22、3。但是我想按数字(整数/浮点数)对输出进行排序,以便我将获得输出1,2,3,10,11,22
下面是我的小脚本。 您能帮我修改脚本,以便在排序之前将其列更改为数字吗?
with open ('C:\Automation\sample.csv') as csvfile:
readCSVfile = csv.reader(csvfile,delimiter =',')
for row in readCSVfile:
sort = sorted(readCSVfile, key=operator.itemgetter(1), reverse = False)
for eachline in sort:
print eachline`
您可以在阅读以下内容时对其进行预处理:
#!python2
import csv
import operator
with open ('sample.csv','rb') as csvfile:
readCSVfile = csv.reader(csvfile)
header = next(readCSVfile)
rows = []
for row in readCSVfile:
test,cycle,date,status = row
rows.append([test,int(cycle),date,status])
rows.sort(key=operator.itemgetter(1))
for row in rows:
print row
输出:
['func', 1, '09/08/17', 'no run']
['func', 2, '09/07/17', 'pass']
['func', 3, '09/08/17', 'pass']
['func', 10, '09/08/17', 'fail']
['func', 11, '09/08/17', 'on hold']
['func', 22, '09/08/17', 'in progress']
您还可以使用其他排序键,将列保留为字符串:
#!python2
import csv
import operator
with open ('sample.csv','rb') as csvfile:
readCSVfile = csv.reader(csvfile)
header = next(readCSVfile)
rows = [row for row in readCSVfile]
rows.sort(key=lambda row: int(row[1]))
for row in rows:
print row
输出:
['func', '1', '09/08/17', 'no run']
['func', '2', '09/07/17', 'pass']
['func', '3', '09/08/17', 'pass']
['func', '10', '09/08/17', 'fail']
['func', '11', '09/08/17', 'on hold']
['func', '22', '09/08/17', 'in progress']
然后,您必须将其转换为数字。 Python csv
模块无法自动识别数据类型。
您可以通过类似的方法来做到这一点:
numberedCSV = []
for row in readCSVfile:
row[1] = int(row[1])
numberedCSV.append(row)
然后对numberedCSV
进行排序。
顺便说一句,我不明白您打算发布的代码。 为什么需要两个循环?
这可能是您要寻找的。
# take second element for sort
def takeSecond(elem):
return int(elem[1])
# random list
stuff = [['func', '1', '09/08/17', 'no run'],
['func', '10', '09/08/17', 'fail'],
['func', '11', '09/08/17', 'on hold'],
['func', '2', '09/07/17', 'pass'],
['func', '22', '09/08/17', 'in progress'],
['func', '3', '09/08/17', 'pass']]
# sort list with key
sortedList = sorted(stuff, key=takeSecond)
# print list
print('Sorted list:', sortedList)
干杯。
正如其他答案所说,您可以
operator.itemgetter
另一个函数将值转换为int
但是,如果经常使用这种表格数据,最好使用pandas
。 您需要安装它,但是再次:如果经常执行此操作,那是值得的。
import pandas as pd
df = pd.read_csv('sample.csv')
df['cycle'] = df['cycle'].astype(int)
print(df.sort_values(by='cycle'))
# or reverse
print(df.sort_values(by='cycle', ascending=False))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.