简体   繁体   English

Python根据列值从最高到最低对文本文件进行排序

[英]Python Sorting Text File From Highest To Lowest Based On Column Values

I have a very large text file that contains lines of data like: 我有一个非常大的文本文件,其中包含如下几行数据:

('#DownWithAssad', '1')
('#DownYoTLParty', '1')
('#Download', '8')
('#Download:', '2')
('#Downloads', '2')
('#DownstairsMixtape', '1')
('#DowntonAbbey', '12')
('#DowntonAbbey?', '1')
('#DowntonPBS', '23')
('#Downtonabbey', '1')
('#DowntownAbbey', '1')

This may seem like a simple problem but I want to soft the data from highest to lowest so it looks like: 这似乎是一个简单的问题,但是我想将数据从最高到最低进行软化,因此它看起来像:

('#DowntonPBS', '23')
('#DowntonAbbey', '12')
('#Download', '8')
('#Download:', '2')
('#Downloads', '2')
('#DownstairsMixtape', '1')
('#DownWithAssad', '1')
('#DownYoTLParty', '1')
('#DowntonAbbey?', '1')
('#Downtonabbey', '1')
('#DowntownAbbey', '1')

I gather that I can eliminate the brackets () and split the data with: 我认为可以消除方括号()并使用以下方法拆分数据:

import sys

f = open(sys.argv[1])
for line in f:
    line = str(line)[1 : -1]
    for sect in line.split(','):
        print sect

However I'm not sure where to go from here. 但是我不确定从这里去哪里。

You can parse your text file quite easily using ast.literal_eval : 您可以使用ast.literal_eval轻松解析文本文件:

with open(datafile) as f:
    file_sorted = sorted((ast.literal_eval(x) for x in f),
                         key=lambda z:(int(z[1]),z[0]),
                         reverse=True)

How it works: 这个怎么运作:

(ast.literal_eval(x) for x in f)  #turn each line in your file into a tuple
key=lambda z:(int(z[1]),z[0])     #function to determine how things are sorted.  Basically
                                  #sort as tuples:  `( int(z[1]),z[0] )`
reverse=True                      #descending order instead of ascending

This is along the lines of what you are trying to do. 这就是您要尝试做的事情。 Note that parsing the lines this way is quite fragile (misformatted lines may break it) 请注意,以这种方式解析行非常脆弱(格式错误的行可能会破坏它)

from operator import itemgetter
import sys

result=[]
with open(sys.argv[1]) as f:
    for line in f:
        line = str(line.strip())[1: -1]
        sect1, sect2 = line.split(', ')
        sect1 = sect1[1: -1]
        sect2 = int(sect2[1: -1])
        result.append((sect1, sect2))

for line in sorted(result, key=itemgetter(1), reverse=True):
    print line

Better ways to parse it would be to use literal_eval or regular expressions. 解析它的更好方法是使用literal_eval或正则表达式。 Do you know if there is any special treatment when quote characters or commas appear in the strings? 您是否知道在字符串中使用引号或逗号时是否有特殊处理?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM