So the problem is that given the below input, I would like to separate the URLs (that starts with either [URL or [LINK or [WEBSITE) and the text. I would like to put every URL in order into a list and every text into a text.
I also would like to combine all of the text into one line, so that every link matches with its corresponding text. Below is an example.
[URL - https://url1.com]
news_line1 word
news_line2 word word
news_line3 word word word
[LINK - https://url2.com]
headline_line1 letter
headline_line2 letter letter
headline_line3 letter letter letter
[WEBSITE - https://url3.com]
date_line1 sentence
date_line2 sentence sentence
date_line3 sentence sentence sentence
output would be Links:
[URL - https://url1.com]
[LINK - https://url2.com]
[WEBSITE - https://url3.com]
and Text:
news_line1 word news_line2 word word news_line3 word word word
headline_line1 letter headline_line2 letter letter headline_line3 letter letter letter
date_line1 sentence date_line2 sentence sentence date_line3 sentence sentence sentence
The current code I have is
import sys
inFile = sys.argv[1]
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
url_links = []
sentences = []
for entry in content:
sentence = ""
if entry.startswith(("[URL", "[LINK", "[WEBSITE")):
url_links.append(entry)
else:
sentence = sentence + entry
sentences.append(sentence)
for sentence in sentences:
print(sentence)
And the current output I have is
news_line1 word
news_line2 word word
news_line3 word word word
headline_line1 letter
headline_line2 letter letter
headline_line3 letter letter letter
date_line1 sentence
date_line2 sentence sentence
date_line3 sentence sentence sentence
How can I tweak it such that it gives me the correct output?
Again, the desired output is
news_line1 word news_line2 word word news_line3 word word word
headline_line1 letter headline_line2 letter letter headline_line3 letter letter letter
date_line1 sentence date_line2 sentence sentence date_line3 sentence sentence sentence
Instead of concatenating strings to a variable, you can append an empty string into sentences
everytime you get a "[URL"
"[WEBSITE"
"[LINK"
. And make all text appends to last sentence of sentences.
import sys
inFile = sys.argv[1]
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
url_links = []
sentences = []
for entry in content:
if entry.startswith(("[URL", "[LINK", "[WEBSITE")):
url_links.append(entry)
sentences.append("")
else:
sentences[-1] += entry
for sentence in sentences:
print(sentence)
Here, I am concatenating strings using "+" however according to your requirements and python version there maybe faster alternatives to it.
Which is the preferred way to concatenate a string in Python?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.