![](/img/trans.png)
[英]Extracting Elements from multiple local .html files using GLOB, BS4, and writing to CSV Excel
[英]Create multiple files with filenames from bs4
如何將每個<a>
行寫入它自己的文件並使用 H2 作為文件名???
import re
import requests
from bs4 import BeautifulSoup
import os
data = '<html><div class="colors"> <a href="/green"> <div class="values"> GRN <h2 class="tester"> Green </h2> </div> </a> <a href="/purple"> <div class="values"> PURP <h2 class="tester"> Purple </h2> </div> </a> <a href="/orange"> <div class="values"> ORNG <h2 class="tester"> Orange </h2> </div> </a> </div><html>'
soup = BeautifulSoup(data, "html.parser")
colors = soup.find("div", {"class": "colors"})
for lines in colors:
docs = lines.find("h2").text.strip()
file = open('C:/Users/Admin/Desktop/'+str(doc)+'.txt', 'a', encoding='utf-8')
file.write(str(lines))
file.close()
查找包含文件名和 html 內容的結果。
Green.txt <a href="/green"> <div class="values"> GRN <h2 class="tester"> Green </h2> </div> </a>
Purple.txt <a href="/purple"> <div class="values"> PURP <h2 class="tester"> Purple </h2> </div> </a>
Orange.txt <a href="/orange"> <div class="values"> ORNG <h2 class="tester"> Orange </h2> </div> </a>
希望我做對了你必須用 class colors迭代<a>
而不是<div>
來實現你的目標:
for e in soup.select('.colors a'):
name = e.h2.get_text(strip=True)
html = str(e)
file = open(name+'.txt', 'a', encoding='utf-8')
file.write(html)
file.close()
from bs4 import BeautifulSoup
import os
data = '<html><div class="colors"> <a href="/green"> <div class="values"> GRN <h2 class="tester"> Green </h2> </div> </a> <a href="/purple"> <div class="values"> PURP <h2 class="tester"> Purple </h2> </div> </a> <a href="/orange"> <div class="values"> ORNG <h2 class="tester"> Orange </h2> </div> </a> </div><html>'
soup = BeautifulSoup(data, "html.parser")
for e in soup.select('.colors a'):
name = e.h2.get_text(strip=True)
html = str(e)
file = open(name+'.txt', 'a', encoding='utf-8')
file.write(html)
file.close()
您可以使用find_all
方法提取以獲取所有標簽並迭代a
從h2
標簽中獲取文件名,您可以獲得所需的 output:
links=colors.find_all("a")
for link in links:
fname=link.find("h2").get_text(strip=True)
with open(fname+".txt","w") as wr:
wr.write(str(link))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.