简体   繁体   English

在 Python 上按关键字对 txt 中的行进行排序

[英]Sorting lines from txt by keyword on Python

I faced with problem on python: I want to sort.txt like in desired output below But instead of this output I get wrong output with concatenated first and second lines and blank line at the end of file Why is this happening? I faced with problem on python: I want to sort.txt like in desired output below But instead of this output I get wrong output with concatenated first and second lines and blank line at the end of file Why is this happening?

Thanks in advance for any help提前感谢您的帮助

Input file:输入文件:

https://markus.rmart.ru/wormix_mm/preloader/
https://markus.rmart.ru/wormix_ok/preloader/
https://markus.rmart.ru/engine/preloader/

Desired output:所需的 output:

https://markus.rmart.ru/engine/preloader/
https://markus.rmart.ru/wormix_mm/preloader/
https://markus.rmart.ru/wormix_ok/preloader/

Real output:真正的 output:

https://markus.rmart.ru/engine/preloader/https://markus.rmart.ru/wormix_mm/preloader/
https://markus.rmart.ru/wormix_ok/preloader/

Code:代码:

test_out = open('./test_out999.txt', "w")

def my_sort(line):
    social_folders = {'engine': 1,
                    'wormix_mm': 2,
                    'wormix_ok': 3}
    line_fields = line.strip().split("/")
    social = line_fields[3]
    print(line_fields[3])
    return social_folders[social]

testsortf = open('./testsort.txt')
contents = testsortf.readlines()

contents.sort(key=my_sort)

for line in contents:
        test_out.write(line)

testsortf.close()
test_out.close()

But when I delete last "\n" with line.rstrip('\n') and add "\n" manually, I take this output (with unwanted blank line at the end of file):但是当我用 line.rstrip('\n') 删除最后一个 "\n" 并手动添加 "\n" 时,我会使用这个 output (文件末尾有不需要的空行):

https://markus.rmart.ru/engine/preloader/
https://markus.rmart.ru/wormix_mm/preloader/
https://markus.rmart.ru/wormix_ok/preloader/

Small fix:小修复:

test_out.write(line.rstrip('\n') + "\n")

So, why did it happen and how me to get desired output?那么,为什么会发生这种情况以及我如何获得所需的 output?

And, if anyone can help me with problem, next... How to get this output?而且,如果有人可以帮助我解决问题,接下来......如何获得这个 output?

First:
https://markus.rmart.ru/engine/preloader/

Second:
https://markus.rmart.ru/wormix_mm/preloader/

Third:
https://markus.rmart.ru/wormix_ok/preloader/

When you add \n to each line, the \n is added to the last line as well.当您将\n添加到每一行时, \n也会添加到最后一行。 On every line apart from the newline, something is written on the newline created before - however, on the last line, nothing is written in that newline, leaving it blank.在除换行符之外的每一行上,在之前创建的换行符上都写了一些东西 - 但是,在最后一行,该换行符中没有写任何内容,将其留空。 Here is an example:这是一个例子:

Iteration 1:迭代 1:

https://markus.rmart.ru/wormix_mm/preloader/

Iteration 2:迭代 2:

https://markus.rmart.ru/wormix_mm/preloader/
https://markus.rmart.ru/wormix_ok/preloader/

Notice how that newline that we created in Interation 1 has text in it now.请注意我们在 Interation 1 中创建的换行符现在是如何包含文本的。 If there was no newline, it would look like this:如果没有换行符,它将如下所示:

https://markus.rmart.ru/wormix_mm/preloader/https://markus.rmart.ru/wormix_ok/preloader/

because the text is written from the end of the file.因为文本是从文件末尾写入的。

Finally, Iteration 3:最后,迭代 3:

https://markus.rmart.ru/engine/preloader/
https://markus.rmart.ru/wormix_mm/preloader/
https://markus.rmart.ru/wormix_ok/preloader/

As you can see, nothing is being written after Iteration 3, leaving the last line blank.如您所见,迭代 3 之后没有写入任何内容,最后一行留空。

In order to fix that, you would have to do a simple check to see if the line is currently the last line (replace your for line in contents for loop with this):为了解决这个问题,你必须做一个简单的检查,看看该行当前是否是最后一行(用这个替换你的for line in contents ):

for i in range(len(contents)):
    test_out.write(line.rstrip('\n'))
    if i < len(contents) - 1:
        test_out.write("\n")

In order to do what you want (the First , Second , Third ), you could just have a list full of those words:为了做你想做的事( FirstSecondThird ),你可以有一个充满这些词的列表:

num_to_word = ["First", "Second", "Third"]
for i in range(len(contents)):
    test_out.write(num_to_word[i] + ":\n")
    test_out.write(line.rstrip('\n'))
    if i < len(contents) - 1:
        test_out.write("\n\n") # Two newlines to add a line in between

(I haven't tested this, please tell me if it doesn't work) (我没有测试过,如果它不起作用请告诉我)

Your unexpected:你的意外:

https://markus.rmart.ru/engine/preloader/https://markus.rmart.ru/wormix_mm/preloader/
https://markus.rmart.ru/wormix_ok/preloader/

is because the last line of the input file doesn't have the newline.是因为输入文件的最后一行没有换行符。
So if we mark newlines as :因此,如果我们将换行符标记为

Input file:输入文件:

 https://markus.rmart.ru/wormix_mm/preloader/△ https://markus.rmart.ru/wormix_ok/preloader/△ https://markus.rmart.ru/engine/preloader/

So 2 elements of content have \n suffixed, and 1 doesn't, causing the different behavior.所以content的 2 个元素有\n后缀,而 1 没有,导致不同的行为。
The simple fix would be not adding an extra newline every time , but only to the last:简单的解决方法不是每次都添加一个额外的换行符,而只是添加到最后一个:

contents = testsortf.readlines()
contents[-1] = f'{contents[-1]}\n'

If contents may be empty:如果contents可能为空:

contents = testsortf.readlines()
if contents:
    contents[-1] = f'{contents[-1]}\n'

So we now have the code:所以我们现在有代码:

test_out = open('...', "w")

def my_sort(line):
    social_folders = {'engine': 1,
                    'wormix_mm': 2,
                    'wormix_ok': 3}
    line_fields = line.strip().split("/")
    social = line_fields[3]
    print(line_fields[3])
    return social_folders[social]

testsortf = open('...')
contents = testsortf.readlines()
contents[-1] = f'{contents[-1]}\n'
contents.sort(key=my_sort)
for line in contents:
    test_out.write(line)

testsortf.close()
test_out.close()

In order to add First , Second etc., first add a tuple , eg为了添加FirstSecond等,首先添加一个tuple ,例如

numbers = 'First', 'Second', 'Third'

And then use the handy enumerate() :然后使用方便的enumerate()

test_out = open('./test_out999.txt', "w")

def my_sort(line):
    social_folders = {'engine': 1,
                    'wormix_mm': 2,
                    'wormix_ok': 3}
    line_fields = line.strip().split("/")
    social = line_fields[3]
    print(line_fields[3])
    return social_folders[social]

numbers = 'First', 'Second', 'Third'  # <---
testsortf = open('./testsort.txt')
contents = testsortf.readlines()
contents[-1] = f'{contents[-1]}\n'
contents.sort(key=my_sort)
for i, line in enumerate(contents):
    test_out.write(f'{numbers[i]}:\n{line}')  # No., newline, content
    if i+1 < len(contents):  # Don't add additional \n for last line
        test_out.write('\n')

testsortf.close()
test_out.close()

An additional suggestion:一个额外的建议:
Using with... as f is good practice in Python, because it closes the file even if errors occur.在 Python 中使用with... as f是一种很好的做法,因为即使发生错误,它也会关闭文件。 So the final code:所以最后的代码:

def my_sort(line):
    social_folders = {'engine': 1,
                    'wormix_mm': 2,
                    'wormix_ok': 3}
    line_fields = line.strip().split("/")
    social = line_fields[3]
    print(line_fields[3])
    return social_folders[social]

numbers = 'First', 'Second', 'Third', 'Fourth'
with open('./testsort.txt') as testsortf, \
     open('./test_out999.txt', "w") as test_out:
    contents = testsortf.readlines()
    contents[-1] = f'{contents[-1]}\n'
    contents.sort(key=my_sort)
    for i, line in enumerate(contents):
        test_out.write(f'{numbers[i]}:\n{line}')
        if i+1 < len(contents):  # Don't add additional \n for last line
            test_out.write('\n')
    # No need to call close()!

Notes笔记

  1. See PEP 279 for more information about enumerate() .有关enumerate()的更多信息,请参阅PEP 279
  2. f-strings ( f'...{...}...' ) were added in Python 3.6 by PEP 498 . f-strings ( f'...{...}...' ) 被PEP 498添加到 Python 3.6 中。 Use '...{}...'.format(...) for Python 3.5 or lower.对 Python 3.5 或更低版本使用'...{}...'.format(...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM