简体   繁体   中英

BeautifulSoup find_all in a list

I am trying to use the BeautifulSoup find_all command twice. I use it the first time to find all table tags. I then have a few if statements within a loop to narrow down the amount of table tags I append into my list. Finally I try using the find_all command on my list and receive the error "'list' object has no attribute 'find'".

I understand the error is basically saying find_all can't look through a list, but I can't think of any other way to sort out my data. Is there anyway I can get around this error or any other commands I can try.

result = requests.get("https://www.sec.gov/Archives/edgar/data/861838/000095013509003622/0000950135-09-003622.txt")
src = result.content
soup = BeautifulSoup(src, "html.parser")

table = soup.find_all("table")
tbl = len(table)

sort1 = []

i = 0
while i < tbl - 1:
    i = i + 1
    if ("sale" in table[i].text) or ("revenue" in table[i].text):
        if "expense" in table[i].text:
            if "income" in table[i].text:
                sort1.append(table[i].text)

# error shows up here
td = sort1.find_all("td")

Try this:

td = []

i = 0
for tag in table:
    if ("sale" in tag.text) or ("revenue" in tag.text):
        if ("expense" in tag.text) and ("income" in tag.text):
            td.append(table[i].find("td"))

It'll add any <td> it finds into the list.

I think your logis is doing the same as the following using :contains (bs4 4.7.1)

Basically

table:contains(sale):contains(expense):contains(income)

table with sale, expense and income

or

table:contains(revenue):contains(expense):contains(income)')

table with revenue, expense and income

Returns two tables in example.

import requests
from bs4 import BeautifulSoup as bs

result = requests.get("https://www.sec.gov/Archives/edgar/data/861838/000095013509003622/0000950135-09-003622.txt")
src = result.content
soup = bs(src, "lxml")
sort1 = [i.text for i in soup.select('table:contains(sale):contains(expense):contains(income), table:contains(revenue):contains(expense):contains(income)')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM