简体   繁体   English

为使用 CIK 编号 python 的公司提取 10K 份文件 url

[英]Extract 10K filings url for a company using CIK number python

I am working on a project to find the latest 10K filings url for a company using CIK number.我正在开展一个项目,为一家使用 CIK 编号的公司查找最新的 10K 文件 url。 Please find the code below:请找到下面的代码:

import requests
from bs4 import BeautifulSoup

# CIK number for Apple is 0001166559
cik_number = "0001166559"
url = f"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK={cik_number}&type=10-K&dateb=&owner=exclude&count=40"

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the link to the latest 10-K filing
link = soup.find('a', {'id': 'documentsbutton'})
filing_url = link['href']

print(filing_url)

I am getting HTTP 403 error.我收到 HTTP 403 错误。 Please help me请帮我

Thanks谢谢

I was able to get a 200 response by reusing your same snippet.通过重复使用您的相同代码段,我能够获得 200 条回复。 You may have missed to add the headers:您可能错过了添加标题:

import requests
from bs4 import BeautifulSoup

# CIK number for Apple is 0001166559
cik_number = "0001166559"
url = f'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK={cik_number}&type=10-K&dateb=&owner=exclude&count=40'
# add this
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
response = requests.get(url, headers=headers)
print(response)
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

Output: Output:

在此处输入图像描述

NOTE: You can read more on why do we need to add User-Agent in our headers from here .注意:您可以从此处阅读更多关于为什么我们需要在标头中添加User-Agent的信息。 Basically what you need to do is to make sure that the request looks like that it's coming from a browser, so just add an the extra header parameter:基本上你需要做的是确保请求看起来像是来自浏览器,所以只需添加一个额外的 header 参数:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从10-K - 提取SIC,CIK,创建元数据表 - From 10-K — extract SIC, CIK, create metadata table DBSCAN 集群甚至无法处理 40k 数据,但使用 python 和 sklearn 处理 10k 数据 - DBSCAN clustering is not working even on 40k data but working on 10k data using python and sklearn Python编号为Word,例如10K,10.65M - Python Number to Word e.g 10K, 10.65M 从(Edgar 10-K 文件)HTML 中提取文本部分 - Extracting text section from (Edgar 10-K filings) HTML 在 Python/Pandas 中读取和处理 10k Excell 单元格的最快方法? - Fastest approach to read and process 10k Excell cells in Python/Pandas? 每天只运行 10k 次请求,第二天再运行 10k 次,依此类推 - Run only 10k requests per day and next day another 10k and so on 如何使用 Pandas 从 InfluxDB 检索超过 10k 行? - How to retrive more than 10k lines from InfluxDB using Pandas? 使用 Seaborn 为 x 轴绘制超过 10K 的数据点作为时间戳 - Plotting more than 10K data point using Seaborn for x-axis as timestamp 如何使用 matplotlib plot 多 10k 点? - How can I plot more 10k points using matplotlib? 将具有10k.txt文件的文件夹转换为数组的最快方法是什么(python) - What is the fastest way to convert a folder with 10k .txt files into an array (python)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM