简体   繁体   English

如何对文件中的特定信息进行排序

[英]How to sort specific info in a file

I have a pre-made text file that has peoples names and scores in it. 我有一个预先制作的文本文件,里面有人名和分数。 They each have three scores, each separated by a tab. 它们每个都有三个分数,每个分数由一个标签分隔。

John    12    13    21
Zack    14    19    12
Tim     18    22    8
Jill    13    3     22

Now, my goal is to sort the names alphabetically with only the highest score displayed. 现在,我的目标是按字母顺序对名称进行排序,只显示最高分数。 To look like this: 看起来像这样:

Jill   22
John   21
Tim    18
Zack   19

Once the file has been sorted, I want to print it on the python shell. 文件排序后,我想在python shell上打印它。 I have defined the code because I am going to implement it into my other code that I have created. 我已经定义了代码,因为我将把它实现到我创建的其他代码中。

from operator import itemgetter

def highscore():
    file1 = open("file.txt","r")
    file1.readlines()
    score1 = file1(key=itemgetter(1))
    score2 = file1(key=itemgetter(2))
    score3 = file1(key=itemgetter(3))


def class1alphabetical():
    with open('file.txt') as file1in:
        lines = [line.split('/t') for line in file1in]
        lines.sort()
    with open('file.txt', 'w') as file1out:
        for el in lines:
            file1out.write('{0}\n'.format(' '.join(el)))
    with open('file.txt','r') as fileqsort:
        for line in file1sort:
            print(line[:-1])
        file1sort.close

classfilealphabetical()

I have used info from other questions such as: Sorting information from a file in python and Python : Sort file by arbitrary column, where column contains time values 我使用过其他问题的信息,例如: 从pythonPython中 的文件中排序信息 :按任意列排序文件,其中列包含时间值

However, I am still stuck on what to do now. 但是,我仍然坚持现在该做什么。

whoa, you seem to be doing things a bit too complicated. 哇,你似乎做得太复杂了。

This is a rough idea. 这是一个粗略的想法。

#this will get your folks in alpha
lines = f.readlines()
lines.sort()

#now, on each line, you want to split (that attrgetter is too complicated and
#blows up if <> 3 grades.

# use the special feature of split() with no parameter to remove all spaces and \t characters
fields = line.split()
name, grades = fields[0], fields[1:]

#cast your grades to integers  
grades = [int(grade) for grade in grades]

#sort and pick the last one
grades.sort()
highest = grades[-1]

#or... use max as suggested
highest = max(grades)

#write to output file....

another piece of advice, use open with context managers for your files, they can be nested. 另一条建议是,对文件使用open with context manager,它们可以嵌套。 closing resources is a major component of well-behaved pgms. 关闭资源是表现良好的pgms的主要组成部分。

with open("/temp/myinput.txt","r") as fi:
    ....

Once you have your lines in a sorted list try this: 将您的行放入已排序的列表后,请尝试以下操作:

output = ["{} {}".format(i[0], max(i[1:], key=int)) for i in lines]

for i in output:
    print i

Jill 22
John 21
Tim 22
Zack 19

output is a list created using a list comprehension . output是使用列表推导创建的列表

The curly brackets(' {} ') are replaced by the arguments passed to str.format() . 大括号(' {} ')由传递给str.format()的参数替换。 The str in this case being "{} {}" 在这种情况下, str"{} {}"

The max function takes a keyword argument 'key', as seen above, which lets you specify a function to apply to each item in the iterable given to max (The iterable in this case being i[1:]). max函数接受一个关键字参数'key',如上所示,它允许你指定一个函数来应用给定max的迭代中的每个项目(在这种情况下,iterable是i [1:])。 I used int because all the the items in the list were strings(containing numbers), and had to be converted to int s. 我使用int是因为列表中的所有项都是字符串(包含数字),并且必须转换为int

This is quite easy to do with some builtin functions and an interation: 这对于一些内置函数和交互非常容易:

Code: 码:

#!/usr/bin/env python


from operator import itemgetter


scores = """\
John\t12\t13\t21\n
Zack\t14\t19\t12\n
Tim\t18\t22\t8\n
Jill\t13\t3\t22"""


datum = [x.split("\t") for x in filter(None, scores.split("\n"))]
for data in sorted(datum, key=itemgetter(0)):
    name, scores = data[0], map(int, data[1:])
    max_score = max(scores)
    print "{0:s} {1:d}".format(name, max_score)

Output: 输出:

$ python -i scores.py 
Jill 22
John 21
Tim 22
Zack 19
>>> 

There are distinctly two tasks: 有两个明确的任务:

  1. keep only the top score 保持最高分
  2. sort lines by name alphabetically 按字母顺序按行排序

Here's a standalone script that removes all scores from each line except the highest one: 这是一个独立的脚本,删除除最高行之外的每一行的所有分数:

#!/usr/bin/env python3
import sys
import fileinput

try:
    sys.argv.remove('--inplace') # don't modify file(s) unless asked
except ValueError:
    inplace = False
else:
    inplace = True # modify the files given on the command line

if len(sys.argv) < 2:
    sys.exit('Usage: keep-top-score [--inplace] <file>')

for line in fileinput.input(inplace=inplace):
    name, *scores = line.split() # split on whitespace (not only tab)
    if scores:
        # keep only the top score
        top_score = max(scores, key=int)
        print(name, top_score, sep='\t')
    else:
        print(line, end='') # print as is

Example: 例:

$ python3 keep_top_score.py class6Afile.txt

To print the lines sorted by name: 要打印按名称排序的行:

$ sort -k1 class6Afile.txt

The result of the sort command depends on your current locale eg, you could use LC_ALL=C to sort by byte values. sort命令的结果取决于您当前的语言环境,例如,您可以使用LC_ALL=C来按字节值排序。

Or if you want Python solution: 或者如果你想要Python解决方案:

#!/usr/bin/env python
import sys
from io import open

filename = sys.argv[1] 
with open(filename) as file:
    lines = file.readlines() # read lines

# sort by name
lines.sort(key=lambda line: line.partition('\t')[0])

with open(filename, 'w') as file:
    file.writelines(lines) # write the sorted lines

The names are sorted as Unicode text here. 这里的名称按Unicode文本排序。 You could provide the explicit character encoding used in the file otherwise the default (based on your locale) encoding is used. 您可以提供文件中使用的显式字符编码,否则将使用默认(基于您的语言环境)编码。

Example: 例:

$ python sort_inplace_by_name.py class6Afile.txt

Result 结果

Jill    22
John    21
Tim 22
Zack    19

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM