BeautifulSoup 4 Python Web 抓取到txt文件

Question

我正在尝试将数据写入我使用 BeautifulSoup 抓取的文本文件它在控制台中打印数据但不打印到文件中

import requests
from bs4 import BeautifulSoup
 
base_url = 'https://www.aaabbbccc.com'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)

outF = open("myOutFile.txt", "w", encoding='utf-8')
for story_heading in soup.find_all(class_="col-md-4"): 
    
    if story_heading.a: 
        print(story_heading.a.text.replace("\n", " ").strip())
        outF.write(str(story_heading))
        outF.write("\n")
 
    else: 
        print(story_heading.contents[0].strip())

outF.close()

Answer 1

使用以下方法：

with open("myOutFile.txt", "w", encoding='utf-8') as outF:

例子

import requests
from bs4 import BeautifulSoup
 
base_url = 'https://stackoverflow.com/questions/65815005/beautifulsoup-4-python-web-scraping-to-txt-file'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)

with open("myOutFile.txt", "w", encoding='utf-8') as outF:
    for story_heading in soup.find_all(class_="col-md-4"): 

        if story_heading.a: 
            print(story_heading.a.get_text())
            outF.write(str(story_heading))
            outF.write("\n")

        else: 
            print(story_heading.contents[0].strip())

Answer 2

我会一直使用a+方法！

如果您的硬盘驱动器上不存在该文本文件，它将创建并写入该文件。 如果文本文件存在，它会 append 你的内容到最后吧！

with open("myOutFile.txt", "a+") as f:

import requests
from bs4 import BeautifulSoup
 
base_url = 'https://www.aaabbbccc.com'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)

with open("myOutFile.txt", "a+", encoding='utf-8') as f:
    for story_heading in soup.find_all(class_="col-md-4"): 

        if story_heading.a: 
            print(story_heading.a.get_text())
            f.write(str(story_heading)+"\n")
        else: 
            print(story_heading.contents[0].strip())

Answer 3

我还找到了另一个解决方案，我想与您分享

import requests
from bs4 import BeautifulSoup

def print_to_text(base_url):
    r = requests.get(base_url)
    soup = BeautifulSoup(r.text, 'html.parser')
    with open("myOutFile.txt", "w", encoding='utf-8') as textfile:
        for paragraph in soup.find_all(class_="col-md-4"):
            textfile.write(paragraph.text.replace("<span>",""))

if __name__ == "__main__":
    
    base_url = "https://www.aaabbb.com"

    print_to_text(base_url)

BeautifulSoup 4 Python Web 抓取到txt文件

问题描述

3 个解决方案

解决方案1
1 2021-01-20 18:09:55

解决方案2
1 已采纳 2021-01-20 18:25:36

解决方案3
0 2021-01-21 03:20:39

BeautifulSoup 4 Python Web 抓取到txt文件

问题描述

3 个解决方案

解决方案1 1 2021-01-20 18:09:55

解决方案2 1 已采纳 2021-01-20 18:25:36

解决方案3 0 2021-01-21 03:20:39

解决方案1
1 2021-01-20 18:09:55

解决方案2
1 已采纳 2021-01-20 18:25:36

解决方案3
0 2021-01-21 03:20:39