簡體   English   中英

使用來自 bs4 的文件名創建多個文件

[英]Create multiple files with filenames from bs4

如何將每個<a>行寫入它自己的文件並使用 H2 作為文件名???

import re
import requests
from bs4 import BeautifulSoup
import os

data = '<html><div class="colors"> <a href="/green"> <div class="values"> GRN <h2 class="tester"> Green </h2> </div> </a> <a href="/purple"> <div class="values"> PURP <h2 class="tester"> Purple </h2> </div> </a> <a href="/orange"> <div class="values"> ORNG <h2 class="tester"> Orange </h2> </div> </a> </div><html>'
soup = BeautifulSoup(data, "html.parser")

colors = soup.find("div", {"class": "colors"})

for lines in colors:
    docs = lines.find("h2").text.strip()
    file = open('C:/Users/Admin/Desktop/'+str(doc)+'.txt', 'a', encoding='utf-8')
    file.write(str(lines))
    file.close()

查找包含文件名和 html 內容的結果。

Green.txt <a href="/green"> <div class="values"> GRN <h2 class="tester"> Green </h2> </div> </a>

Purple.txt <a href="/purple"> <div class="values"> PURP <h2 class="tester"> Purple </h2> </div> </a>

Orange.txt <a href="/orange"> <div class="values"> ORNG <h2 class="tester"> Orange </h2> </div> </a>

希望我做對了你必須用 class colors迭代<a>而不是<div>來實現你的目標:

for e in soup.select('.colors a'):
    name = e.h2.get_text(strip=True)
    html = str(e)
    file = open(name+'.txt', 'a', encoding='utf-8')
    file.write(html)
    file.close()

例子

from bs4 import BeautifulSoup
import os

data = '<html><div class="colors"> <a href="/green"> <div class="values"> GRN <h2 class="tester"> Green </h2> </div> </a> <a href="/purple"> <div class="values"> PURP <h2 class="tester"> Purple </h2> </div> </a> <a href="/orange"> <div class="values"> ORNG <h2 class="tester"> Orange </h2> </div> </a> </div><html>'
soup = BeautifulSoup(data, "html.parser")
    
for e in soup.select('.colors a'):
    name = e.h2.get_text(strip=True)
    html = str(e)
    file = open(name+'.txt', 'a', encoding='utf-8')
    file.write(html)
    file.close()

您可以使用find_all方法提取以獲取所有標簽並迭代ah2標簽中獲取文件名,您可以獲得所需的 output:

links=colors.find_all("a")
for link in links:
    fname=link.find("h2").get_text(strip=True)
     with open(fname+".txt","w") as wr:
        wr.write(str(link))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM