I'm a newbie trying to practice using Python to get data from a local HTML file to extract name and ID to save as a table in CSV file. The HTML is as follow:
<td>
<a href="https:............" data_id="45498" class="roster_user_name
......
<span name="Clarence Alan" src="
</a>
</td>
<td>
88889999
</td>
My code to have the name list:
all_urls = [a['name']
for a in soup('span')
if a.has_attr('name')]
good_urls = list(set(all_urls))
print(len(good_urls))
good_urls
I don't know how to extract the ID ('88889999') and combine them into a 2-column table.
I am very new to Python. Thank you for who answer for this.
I asked you if the HTML has <tr>
tags and your reply show that number of tr tags equals the number of entries you want to scrape.
Using beautifulsoup, you can loop through all tr tags, and for each tr tag you can extract the required information.
Example (replace first parameter in BeautifulSoup to html string)
from bs4 import BeautifulSoup
soup = BeautifulSoup('<html> </html>', 'html.parser')
for row in soup.find_all('tr'):
name = row.find_all('td')[0].text
number = row.find_all('td')[1].text
This should loop through all rows and get name and number.
Then you could you CSV
library to store the data.
Example
import csv
with open('file.csv', 'a+', newline='') as file:
writer = csv.writer(file)
writer.writerow(["COL1", "COL2"])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.