简体   繁体   English

如何按特定顺序组合文本文件中的行?

[英]How do I combine lines in a text file in a specific order?

I'm trying to transform the text in a file according the following rule: for each line, if the line does not begin with "https", add that word to the beginning of subsequent lines until you hit another line with a non-https word.我正在尝试根据以下规则转换文件中的文本:对于每一行,如果该行不是以“https”开头,则将该单词添加到后续行的开头,直到您用非 https 命中另一行单词。

For example, given this file:例如,给定此文件:

Fruit
https://www.apple.com//
https://www.banana.com//
Vegetable
https://www.cucumber.com//
https://www.lettuce.com//

I want我想

Fruit-https://www.apple.com//
Fruit-https://www.banana.com//
Vegetable-https://www.cucumber.com//
Vegetable-https://www.lettuce.com//

Here is my attempt:这是我的尝试:

one = open("links.txt", "r")
for two in one.readlines():

    if "https" not in two:
        sitex = two
        
    else:
        print (sitex + "-" +two)

Here is the output of that program, using the above sample input file:这是该程序的 output,使用上面的示例输入文件:

Fruit
-https://www.apple.com//

Fruit
-https://www.banana.com//       

Vegetable
-https://www.cucumber.com//     

Vegetable
-https://www.lettuce.com//   

What is wrong with my code?我的代码有什么问题?

To fix that we need to implement rstrip() method to sitex to remove the new line character at the end of the string.为了解决这个问题,我们需要为 sitex 实现rstrip()方法,以删除字符串末尾的换行符。 (credit to BrokenBenchmark ) (归功于BrokenBenchmark

second, the print command by default newlines everytime it's called, so we must add the end="" parameter to fix this.其次,打印命令每次调用时默认换行,所以我们必须添加end=""参数来解决这个问题。

So your code should look like this所以你的代码应该是这样的

one = open("links.txt", "r")
for two in one.readlines():
    if "https" not in two:
        sitex = two.rstrip()
    else:
        print (sitex + "-" +two,end="")
one.close()

Also always close the file when you are done.完成后也请始终关闭文件。

Lines in your file end on "\n" - the newline character.文件中的行以"\n" (换行符)结尾。

You can remove whitespaces (includes "\n" ) from a string using strip() (both ends) or rstrip() / lstrip() (remove at one end).您可以使用strip() (两端)或rstrip() / lstrip() (在一端删除)从字符串中删除空格(包括"\n" )。

print() adds a "\n" at its end by default, you can omit this using print()默认在其末尾添加一个"\n" ,您可以使用省略它

print("something", end=" ")
print("more)   # ==> 'something more' in one line

Fix for your code:修复您的代码:

# use a context handler for better file handling
with open("data.txt","w") as f:
    f.write("""Fruit
https://www.apple.com//
https://www.banana.com//
Vegetable
https://www.cucumber.com//
https://www.lettuce.com//
""")


with open("data.txt") as f:
    what = ""
    # iterate file line by line instead of reading all at once
    for line in f:
        # remove whitespace from current line, including \n
        # front AND back - you could use rstring here as well
        line = line.strip() 
        # only do something for non-empty lines (your file does not
        # contain empty lines, but the last line may be empty
        if line:
            # easier to understand condition without negation
            if line.startswith("http"):
                # printing adds a \n at the end
                print(f"{what}-{line}") # line & what are stripped
            else:
                what = line

Output: Output:

Fruit-https://www.apple.com//
Fruit-https://www.banana.com//
Vegetable-https://www.cucumber.com//
Vegetable-https://www.lettuce.com//

See:看:

[chars] are optional - if not given, whitespaces are removed. [chars]是可选的——如果没有给出,空格将被删除。

You need to strip the trailing newline from the line if it doesn't contain 'https' :如果它不包含'https' ,则需要从该行中删除尾随换行符:

sitex = two

should be应该

sitex = two.rstrip()

You need to do something similar for the else block as well, as ShadowRanger points out:正如ShadowRanger指出的那样,您还需要为else块做类似的事情:

print (sitex + "-" +two)

should be应该

print (sitex + "-" + two.rstrip())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM