[英]How to read until a certain string and repeat in Python?
所以问题是,鉴于以下输入,我想将 URL(以 [URL 或 [LINK 或 [WEBSITE] 开头)和文本分开。 我想将每个 URL 按顺序放入列表中,并将每个文本放入文本中。
我还想将所有文本合并到一行中,以便每个链接都与其对应的文本匹配。 下面是一个例子。
[URL - https://url1.com]
news_line1 word
news_line2 word word
news_line3 word word word
[LINK - https://url2.com]
headline_line1 letter
headline_line2 letter letter
headline_line3 letter letter letter
[WEBSITE - https://url3.com]
date_line1 sentence
date_line2 sentence sentence
date_line3 sentence sentence sentence
output 将是链接:
[URL - https://url1.com]
[LINK - https://url2.com]
[WEBSITE - https://url3.com]
和文字:
news_line1 word news_line2 word word news_line3 word word word
headline_line1 letter headline_line2 letter letter headline_line3 letter letter letter
date_line1 sentence date_line2 sentence sentence date_line3 sentence sentence sentence
我目前的代码是
import sys
inFile = sys.argv[1]
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
url_links = []
sentences = []
for entry in content:
sentence = ""
if entry.startswith(("[URL", "[LINK", "[WEBSITE")):
url_links.append(entry)
else:
sentence = sentence + entry
sentences.append(sentence)
for sentence in sentences:
print(sentence)
而我目前拥有的output是
news_line1 word
news_line2 word word
news_line3 word word word
headline_line1 letter
headline_line2 letter letter
headline_line3 letter letter letter
date_line1 sentence
date_line2 sentence sentence
date_line3 sentence sentence sentence
我怎样才能调整它,使它给我正确的 output?
同样,所需的 output 是
news_line1 word news_line2 word word news_line3 word word word
headline_line1 letter headline_line2 letter letter headline_line3 letter letter letter
date_line1 sentence date_line2 sentence sentence date_line3 sentence sentence sentence
每次获得"[URL"
"[WEBSITE"
"[LINK"
时,您都可以将空字符串转换为sentences
,而不是将字符串连接到变量。 并使所有文本附加到句子的最后一句。
import sys
inFile = sys.argv[1]
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
url_links = []
sentences = []
for entry in content:
if entry.startswith(("[URL", "[LINK", "[WEBSITE")):
url_links.append(entry)
sentences.append("")
else:
sentences[-1] += entry
for sentence in sentences:
print(sentence)
在这里,我使用“+”连接字符串,但是根据您的要求和 python 版本,可能会有更快的替代方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.