繁体   English   中英

如何读到某个字符串并在 Python 中重复?

[英]How to read until a certain string and repeat in Python?

所以问题是,鉴于以下输入,我想将 URL(以 [URL 或 [LINK 或 [WEBSITE] 开头)和文本分开。 我想将每个 URL 按顺序放入列表中,并将每个文本放入文本中。

我还想将所有文本合并到一行中,以便每个链接都与其对应的文本匹配。 下面是一个例子。

[URL - https://url1.com]
news_line1 word
news_line2 word word
news_line3 word word word

[LINK - https://url2.com]
headline_line1 letter
headline_line2 letter letter
headline_line3 letter letter letter

[WEBSITE - https://url3.com]
date_line1 sentence
date_line2 sentence sentence
date_line3 sentence sentence sentence

output 将是链接:

[URL - https://url1.com]
[LINK - https://url2.com]
[WEBSITE - https://url3.com]

和文字:

news_line1 word news_line2 word word news_line3 word word word
headline_line1 letter headline_line2 letter letter headline_line3 letter letter letter
date_line1 sentence date_line2 sentence sentence date_line3 sentence sentence sentence

我目前的代码是

import sys

inFile = sys.argv[1]

with open(inFile) as f:
    content = f.readlines()

content = [x.strip() for x in content]
url_links = []
sentences = []

for entry in content:
    sentence = ""
    if entry.startswith(("[URL", "[LINK", "[WEBSITE")):
        url_links.append(entry)

    else:
        sentence = sentence + entry

    sentences.append(sentence)

for sentence in sentences:
    print(sentence)

而我目前拥有的output是


news_line1 word
news_line2 word word
news_line3 word word word


headline_line1 letter
headline_line2 letter letter
headline_line3 letter letter letter


date_line1 sentence
date_line2 sentence sentence
date_line3 sentence sentence sentence

我怎样才能调整它,使它给我正确的 output?

同样,所需的 output 是

news_line1 word news_line2 word word news_line3 word word word
headline_line1 letter headline_line2 letter letter headline_line3 letter letter letter
date_line1 sentence date_line2 sentence sentence date_line3 sentence sentence sentence

每次获得"[URL" "[WEBSITE" "[LINK"时,您都可以将空字符串转换为sentences ,而不是将字符串连接到变量。 并使所有文本附加到句子的最后一句。

import sys

inFile = sys.argv[1]

with open(inFile) as f:
    content = f.readlines()

content = [x.strip() for x in content]
url_links = []
sentences = []

for entry in content:
    if entry.startswith(("[URL", "[LINK", "[WEBSITE")):
        url_links.append(entry)
        sentences.append("")

    else:
        sentences[-1] += entry


for sentence in sentences:
    print(sentence)

在这里,我使用“+”连接字符串,但是根据您的要求和 python 版本,可能会有更快的替代方案。

在 Python 中连接字符串的首选方法是哪种?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM