简体   繁体   中英

Beautiful soup: why not printing inside the for-loop in my code?

from bs4 import BeautifulSoup
import numpy as np
import requests 
from selenium import webdriver
from nltk.tokenize import sent_tokenize,word_tokenize

html = webdriver.Firefox(executable_path=r'D:\geckodriver.exe')
html.get("https://www.tsa.gov/coronavirus/passenger-throughput")

def TSA_travel_numbers(html):
    print('NASEEF')

    soup = BeautifulSoup(html,'lxml')
    print('naseef2')

    for i,rows in enumerate(soup.find_all('tr',class_='view-content')):
        print('naseef3')
        for texts in soup.find('td',header = 'view-field-2021-throughput-table-column'):
            print('naseef4')
            number = texts.text
            if number is None:
                continue
                
            print('Naseef')

TSA_travel_numbers(html.page_source)

As you can see NASEEF and naseef2 gets printed into the console, but not naseef3 and naseef4, and no error to this code, it runs fine, I don't know what is happening here, anyone please point me what is really happening here? In other words it is not going inside the for loops specified in that function. please help me, and sorry for your time and advance thanks!

Your page does not contain <tr> tags with a class of view-content , so find_all is correctly returning no results. If you remove the class restriction, you get many results:

>>> soup.find_all('tr', limit=2)
[<tr>
<th class="views-align-center views-field views-field-field-today-date views-align-center" id="view-field-today-date-table-column" scope="col">Date</th>
<th class="views-align-center views-field views-field-field-2021-throughput views-align-center" id="view-field-2021-throughput-table-column" scope="col">2021 Traveler Throughput </th>
<th class="views-align-center views-field views-field-field-2020-throughput views-align-center" id="view-field-2020-throughput-table-column" scope="col">2020 Traveler Throughput </th>
<th class="views-align-center views-field views-field-field-2019-throughput views-align-center" id="view-field-2019-throughput-table-column" scope="col">2019 Traveler Throughput </th>
</tr>, <tr>
<td class="views-field views-field-field-today-date views-align-center" headers="view-field-today-date-table-column">5/9/2021          </td>
<td class="views-field views-field-field-2021-throughput views-align-center" headers="view-field-2021-throughput-table-column">1,707,805          </td>
<td class="views-field views-field-field-2020-throughput views-align-center" headers="view-field-2020-throughput-table-column">200,815          </td>
<td class="views-field views-field-field-2019-throughput views-align-center" headers="view-field-2019-throughput-table-column">2,419,114          </td>
</tr>]

Once you change that, the inner loop is looking for <td> tags with a header of view-field-2021-throughput-table-column . There are no such tags in the page either, but there are those which have a headers field with that name.

This line is also wrong:

number = texts.text

...because texts is a NavigableString and does not have the text attribute.

Additionally, the word naseef is not really clear as to what it means, so it's better to replace that with more descriptive strings. Finally, you don't really need the Selenium connection or the tokenizer, so for the purposes of this example we can leave those out. The resulting code looks like this:

from bs4 import BeautifulSoup
import numpy as np
import requests

html = requests.get("https://www.tsa.gov/coronavirus/passenger-throughput").text

def TSA_travel_numbers(html):
    print('Entering parsing function')

    soup = BeautifulSoup(html,'lxml')
    print('Parsed HTML to soup')

    for i,rows in enumerate(soup.find_all('tr')):
        print('Found <tr> tag number', i)
        for texts in soup.find('td',headers = 'view-field-2021-throughput-table-column'):
            print('found <td> tag with headers')
            number = texts      
            if number is None:
                continue
            print('Value is', number)

TSA_travel_numbers(html)

Its output looks like:

Entering parsing function
Parsed HTML to soup
Found <tr> tag number 0
found <td> tag with headers
Value is 1,707,805          
Found <tr> tag number 1
found <td> tag with headers
Value is 1,707,805          
Found <tr> tag number 2
found <td> tag with headers
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM