I am trying to write the data to a text file that I scraped using BeautifulSoup It prints the data in the console but not to the file
import requests
from bs4 import BeautifulSoup
base_url = 'https://www.aaabbbccc.com'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)
outF = open("myOutFile.txt", "w", encoding='utf-8')
for story_heading in soup.find_all(class_="col-md-4"):
if story_heading.a:
print(story_heading.a.text.replace("\n", " ").strip())
outF.write(str(story_heading))
outF.write("\n")
else:
print(story_heading.contents[0].strip())
outF.close()
Use the following method:
with open("myOutFile.txt", "w", encoding='utf-8') as outF:
Example
import requests
from bs4 import BeautifulSoup
base_url = 'https://stackoverflow.com/questions/65815005/beautifulsoup-4-python-web-scraping-to-txt-file'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)
with open("myOutFile.txt", "w", encoding='utf-8') as outF:
for story_heading in soup.find_all(class_="col-md-4"):
if story_heading.a:
print(story_heading.a.get_text())
outF.write(str(story_heading))
outF.write("\n")
else:
print(story_heading.contents[0].strip())
I would always use a+
method!
If the text file doesn't exist on your hard-drive, it'll create it and write to it. If the text file exists, it'll append your content to the end of it!
with open("myOutFile.txt", "a+") as f:
import requests
from bs4 import BeautifulSoup
base_url = 'https://www.aaabbbccc.com'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)
with open("myOutFile.txt", "a+", encoding='utf-8') as f:
for story_heading in soup.find_all(class_="col-md-4"):
if story_heading.a:
print(story_heading.a.get_text())
f.write(str(story_heading)+"\n")
else:
print(story_heading.contents[0].strip())
I also found another solution that I wanted to share with you
import requests
from bs4 import BeautifulSoup
def print_to_text(base_url):
r = requests.get(base_url)
soup = BeautifulSoup(r.text, 'html.parser')
with open("myOutFile.txt", "w", encoding='utf-8') as textfile:
for paragraph in soup.find_all(class_="col-md-4"):
textfile.write(paragraph.text.replace("<span>",""))
if __name__ == "__main__":
base_url = "https://www.aaabbb.com"
print_to_text(base_url)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.