python中的美湯具體a href爬取

Question

我正在嘗試學習 beautifulsoup。

在網站中，它具有相同的href，但結果不同。

例如，我的代碼的結果是：

0001545654

6798

你好

0001459640

德克薩斯州

0001269765

加利福尼亞州

0001456527

加利福尼亞州

0001001379

遺傳算法

我只想帶數字

URL 用於數字 = a href="/cgi-bin/browse-edgar?action=getcompany&CIK=0001545654&owner=exclude&count=40&hidefilings=0">0001545654

URL 區域 = a href="/cgi-bin/browse-edgar?action=getcompany&State=HI&owner=exclude&count=40&hidefilings=0">HI

我只想帶CIK！

有什么辦法只帶CIK（0001545654）嗎？

from selenium import webdriver
from bs4 import BeautifulSoup
from urllib.request import urlopen

url = 'https://www.sec.gov/cgi-bin/browse-edgar?company=a&owner=exclude&action=getcompany'
page = BeautifulSoup(urlopen(url), 'html.parser')

CIK = page.find('table', 'tableFile2').find_all('a')

#print(CIK)
for i in CIK:
    print(i.get_text())

Answer 1

最簡單的解決方案可能是過濾您的結果，以便其中只有有效的整數：

CIK = [i for i in CIK if str(i.get_text()).isnumeric()]

或者，您可以改進您的 BeautifulSoup 解析以僅獲取每行的第一項：

CIK = [e.find_all('a')[0] for e in page.find('table', 'tableFile2').find_all('tr')]

python中的美湯具體a href爬取

問題描述

1 個解決方案

解決方案1
0 2020-05-25 13:26:41

python中的美湯具體a href爬取

問題描述

1 個解決方案

解決方案1 0 2020-05-25 13:26:41

解決方案1
0 2020-05-25 13:26:41