简体   繁体   中英

Create a CSV table with name and ID from local HTML file using Python

I'm a newbie trying to practice using Python to get data from a local HTML file to extract name and ID to save as a table in CSV file. The HTML is as follow:

<td>
  <a href="https:............" data_id="45498" class="roster_user_name 
......
<span name="Clarence Alan" src="
</a>
    
</td>

<td>
  
    88889999
  
</td>

My code to have the name list:

all_urls = [a['name']
for a in soup('span')
if a.has_attr('name')]

good_urls = list(set(all_urls))
print(len(good_urls))
good_urls

I don't know how to extract the ID ('88889999') and combine them into a 2-column table.

I am very new to Python. Thank you for who answer for this.

I asked you if the HTML has <tr> tags and your reply show that number of tr tags equals the number of entries you want to scrape.

Using beautifulsoup, you can loop through all tr tags, and for each tr tag you can extract the required information.

Example (replace first parameter in BeautifulSoup to html string)

from bs4 import BeautifulSoup

soup = BeautifulSoup('<html> </html>', 'html.parser')
for row in soup.find_all('tr'):
    name = row.find_all('td')[0].text
    number = row.find_all('td')[1].text

This should loop through all rows and get name and number.

Then you could you CSV library to store the data.

Example

import csv
with open('file.csv', 'a+', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["COL1", "COL2"])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM