简体   繁体   English

txt 文件排序(每行中的键:值) - '\n' 的问题

[英]txt file sorting(key:value in every line) - a problem with '\n'

I am trying to sort txt file which looks like that:我正在尝试对看起来像这样的 txt 文件进行排序:

 byr:1983 iyr:2017 pid:796082981 cid:129 eyr:2030 ecl:oth hgt:182cm iyr:2019 cid:314 eyr:2039 hcl:#cfa07d hgt:171cm ecl:#0180ce byr:2006 pid:8204115568 byr:1991 eyr:2022 hcl:#341e13 iyr:2016 pid:729933757 hgt:167cm ecl:gry hcl:231d64 cid:124 ecl:gmt eyr:2039 hgt:189in pid:#9c3ea1

and so on(+1000 lines), to that structure:依此类推(+1000 行),到该结构:

 byr:value iyr:value eyr:value hgt:value hcl:value ecl:value pid:value cid:value byr:value iyr:value eyr:value hgt:value hcl:value ecl:value pid:value cid:value

byr, iyr etc. order doesn't matter, but every "set" of key:value has to be seperated by blank line. byr、iyr 等顺序无关紧要,但 key:value 的每个“集合”都必须用空行分隔。 My main problem, if I can call it that way, is to create piece of code that sorts the file properly when there is more than one key:value element, I managed to make some progress, but it is still not as it should be - the following code:我的主要问题,如果我可以这样称呼它,是创建一段代码,当有多个 key:value 元素时对文件进行正确排序,我设法取得了一些进展,但它仍然不是应该的- 以下代码:

result_file = open('testresult.txt', 'w')
#list_of_lines = [] testing purpose


with open('input.txt', 'r') as f:
    for line in f:
        if line == "\n":
            #list_of_lines.append('\n') testing
            result_file.writelines('\n')
        else:
            for i in line.split(' '):
                if i[-1] == "n":
                    result_file.write(i)
                else:
                    result_file.write(i + '\n')
                #print(i) testing purpose

is making result as below:正在制作如下结果:

byr:1983
iyr:2017

pid:796082981
cid:129
eyr:2030

ecl:oth
hgt:182cm


iyr:2019

cid:314

eyr:2039
hcl:#cfa07d
hgt:171cm
ecl:#0180ce
byr:2006
pid:8204115568


byr:1991
eyr:2022
hcl:#341e13
iyr:2016
pid:729933757
hgt:167cm
ecl:gry

and as you can see it doesn't work properly - for example there should be no blank line between first occurrence of byr and first occurrence of hgt and so on.如您所见,它无法正常工作 - 例如,在第一次出现 byr 和第一次出现 hgt 之间不应该有空行,依此类推。 It seemed to me that the last if statement在我看来,最后一个 if 语句

if i[-1] == "n":
    result_file.write(i)
else:
    result_file.write(i + '\n')

is protecting me from such situation, but now I totally don't get why isn't it as I "predicted".正在保护我免受这种情况的影响,但现在我完全不明白为什么它不像我“预测”的那样。 Please help.请帮忙。 Thanks from advance <3感谢提前<3

Try this -尝试这个 -

result_file = open('testresult.txt', 'w')
#list_of_lines = [] testing purpose


with open('input.txt', 'r') as f:
    for line in f:
        if line == '\n':
            #list_of_lines.append('\n') testing
            result_file.writelines('\n')
        else:
            # replace '\n' with ''
            line = line.replace('\n', '')
            for i in line.split(' '):
                result_file.writelines(i + '\n')

result_file.close()

Try this尝试这个

lines = []
with open("file.txt", "r") as f:
    lines = f.readlines()

print(lines)

splited_lines = []

for line in lines:
    [ splited_lines.append(splited) for splited in line.split(" ")]

print("splitted_lines")
print(splited_lines)

# notice every occurence in splitted_lines has a '\n', 
# that might be causing your more then on newline problem,
# lets remove that

cleaned_lines = []

[cleaned_lines.append(splited.strip("\n")) for splited in splited_lines]

print("Removed /n")
print(cleaned_lines)

with open("output.txt", "w") as f:
    for line in cleaned_lines:
        f.write(line+"\n")

Having this in file.txt:在 file.txt 中有这个:

byr:1983 iyr:2017
pid:796082981 cid:129 eyr:2030
ecl:oth hgt:182cm

iyr:2019
cid:314
eyr:2039 hcl:#cfa07d hgt:171cm ecl:#0180ce byr:2006 pid:8204115568

byr:1991 eyr:2022 hcl:#341e13 iyr:2016 pid:729933757 hgt:167cm ecl:gry

hcl:231d64 cid:124 ecl:gmt eyr:2039
hgt:189in
pid:#9c3ea1

Running the above script gives me this in output.txt:运行上面的脚本在 output.txt 中给了我这个:

byr:1983
iyr:2017
pid:796082981
cid:129
eyr:2030
ecl:oth
hgt:182cm

iyr:2019
cid:314
eyr:2039
hcl:#cfa07d
hgt:171cm
ecl:#0180ce
byr:2006
pid:8204115568

byr:1991
eyr:2022
hcl:#341e13
iyr:2016
pid:729933757
hgt:167cm
ecl:gry

hcl:231d64
cid:124
ecl:gmt
eyr:2039
hgt:189in
pid:#9c3ea1

Hope this is what you needed?希望这是你需要的吗?

You can delete all \n 's with replace .您可以使用replace删除所有\n

result_file = open('testresult.txt', 'w')
#list_of_lines = [] testing purpose


with open('input.txt', 'r') as f:
    for line in f:
        line = line.replace('\n', '')
        if line != '':
            for i in line.split(' '):
                result_file.write(i+'\n')

And this is result:这是结果:

byr:1983
iyr:2017
pid:796082981
cid:129
eyr:2030
ecl:oth
hgt:182cm
iyr:2019
cid:314
eyr:2039
hcl:#cfa07d
hgt:171cm
ecl:#0180ce
byr:2006
pid:8204115568
byr:1991
eyr:2022
hcl:#341e13
iyr:2016
pid:729933757
hgt:167cm
ecl:gry
hcl:231d64
cid:124
ecl:gmt
eyr:2039
hgt:189in
pid:#9c3ea1

A regular expression may be useful to achieve your result without being annoyed by the end of line character.正则表达式可能有助于实现您的结果,而不会因行尾字符而烦恼。

Assuming there are no whitespaces in your pairs you could use the following script:假设您的配对中没有空格,您可以使用以下脚本:

import re
from contextlib import ExitStack

REGEX = re.compile(r"[^:\s]+:\S+")
with ExitStack() as stack:
    fr = stack.enter_context(open(input, encoding="UTF_8"))
    fw = stack.enter_context(open(output, mode="w", encoding="UTF_8"))
    for line in fr:
        match = REGEX.match(line)
        if not match:
            fw.write("\n")
            continue
        for item in REGEX.findall(line):
            fw.write(f"{item}\n")

The regular expression helps you to search for " anything which is not a semi-colon, nor a whitespace character, followed by a semi-colon. Followed then by anything which is not a whitespace character ".正则表达式可帮助您搜索“任何不是分号或空白字符的内容,后跟一个分号。然后是任何不是空白字符的内容”。 That allows the script to focus on pairs only.这允许脚本只关注对。

Whitespace characters include spaces, tabs and end of line characters.空白字符包括空格、制表符和行尾字符。

The ExitStack feature helps to optimize the use of two context managers. ExitStack 功能有助于优化两个上下文管理器的使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM