[英]Scraping 13F filings from SEC using R
I'm trying to scrape the data in the SEC FORM 13-F Information Table from the following link:我正在尝试从以下链接中抓取 SEC FORM 13-F 信息表中的数据:
https://sec.report/Document/0001567619-21-010281/ https://sec.report/Document/0001567619-21-010281/
I tried the below script:我尝试了以下脚本:
library(timetk)
library(tidyverse)
library(rvest)
url <- "https://sec.report/Document/0001567619-21-010281/"
url <- read_html(url)
raw_data <- url %>%
html_nodes("#table td") %>%
html_text()
However, I'm unable to get the data components and under values, it says that raw_data
is empty.但是,我无法获取数据组件,并且在值下,它表示
raw_data
为空。 Any help would be appreciated.任何帮助,将不胜感激。
The data is present in the response.数据存在于响应中。 You can use a CSS attribute = value selector to target the nested table.
您可以使用 CSS 属性 = 值选择器来定位嵌套表。 You will need to decide what to decide with the initial three rows which need to be transformed into a single header most likely (or not!)
您将需要决定用最有可能(或不)转换为单个标题的前三行来决定什么
library(rvest)
library(magrittr)
page <- read_html("https://sec.report/Document/0001567619-21-010281/")
table <- page %>%
html_node('[summary="Form 13F-NT Header Information"]') %>%
html_table(fill = T)
Use 13F from html page it is much easier here is an example使用 html 页面中的 13F 更容易,这里是一个例子
import pandas as pd
import requests
import numpy as np
# Makes a request to the url
url="https://www.sec.gov/Archives/edgar/data/1541617/000154161721000009/xslForm13F_X01/altcap13f3q21infotable.xml"
request = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
# Pass the html response into read_html
tables = pd.read_html(request.text)
df = tables[3]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.