简体   繁体   English

如何用 Python 中包含相同数据的单行文本替换文件中的多个 3 行文本块

[英]How can I replace multiple 3 line blocks of text in a file with a single line of text containing the same data in Python

I have a text file containing multiple 3 line blocks of text followed by a single new line.我有一个文本文件,其中包含多个 3 行文本块,后跟一个新行。 My data looks like this:我的数据如下所示:

title A - description
http://www.a.site.com/
http://a.anothersite.com/

title B - blah blah
http://www.site.b.com/
http://b.anothersite.com/

title C - yeah yeah
http://www.site.c.com/
http://anothersite.c.com/

The output I'm hoping to achieve is something like this:我希望实现的输出是这样的:

title A - description | http://www.a.site.com/ | http://a.anothersite.com/   
title B - blah blah | http://www.site.b.com/ | http://b.anothersite.com/
title C - yeah yeah | http://www.site.c.com/ | http://anothersite.c.com/

I've been trying to do this with python and I'm not really getting anywhere.我一直在尝试用 python 来做到这一点,但我并没有真正取得任何进展。 The best I was able to accomplish was removing all new lines but that doesn't really help in this case as I still need a new line between each piece of data.我能够完成的最好的事情是删除所有新行,但这在这种情况下并没有真正帮助,因为我仍然需要在每条数据之间添加一条新行。 Any suggestions?有什么建议?

three_lines_joined = ''
strings_to_join = []
results = []


for index, item in enumerate(text):
    if item is not '\n':
        strings_to_join.append(item.strip())
    else:
        three_lines_joined = ' | '.join(strings_to_join)
        results.append(three_lines_joined)
        three_lines_joined = ''
        strings_to_join = []

Here is my solution using regular expressions and replace这是我使用正则表达式和替换的解决方案

import re

text = """
title A - description
http://www.a.site.com/
http://a.anothersite.com/

title B - blah blah
http://www.site.b.com/
http://b.anothersite.com/

title C - yeah yeah
http://www.site.c.com/
http://anothersite.c.com/
"""

text = text.strip()
text = re.sub('[^\n](\n)[^\n]', ' | ', text).replace('\n\n', '\n')

print(text)

Try this:尝试这个:

import re
with open("file.txt", "r+") as f:
    text = " | ".join(f.readlines())
    text = re.sub(r"(?<!^)\n", '', text)
    text = re.sub(r"\s*\|\s*\|\s*", "\n", text)

    f.seek(0)
    f.write(text)

Output of file.txt : file.txt输出:

title A - description | http://www.a.site.com/ | http://a.anothersite.com/
title B - blah blah | http://www.site.b.com/ | http://b.anothersite.com/
title C - yeah yeah | http://www.site.c.com/ | http://anothersite.c.com/

First remove empty lines as you did, then use lines = fulltext.split("\\n") to get a list of lines.首先像您一样删除空行,然后使用lines = fulltext.split("\\n")获取行列表。 Then run something like this:然后运行如下:

for i in range(len(lines))/3:
    title, desc = lines[3*a].split("-")
    website1, website2 = lines[3*a+1], lines[3*a+2]
    print(title + " - " + desc + " | " + website1 + " - " + website2)

which allows you to also use the variables in your code.这也允许您在代码中使用变量。 If you really just want a text output then looking at your input try:如果你真的只想要一个文本输出,那么看看你的输入试试:

fulltext.replace("\n"," ")

which should produce your desired text output (maybe with a little modification).这应该会产生您想要的文本输出(可能需要稍作修改)。 However, I'd reccomend the first version more as it would allow you to later for example use those values for something else.但是,我更推荐第一个版本,因为它可以让您稍后将这些值用于其他用途。 Variales are generally more useful then formatted text documents.变量通常比格式化的文本文档更有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM