[英]any way to download the data with custom queries from url in python?
I want to download the data from USDA
site with custom queries.我想通过自定义查询从
USDA
网站下载数据。 So instead of manually selecting queries in the website, I am thinking about how should I do this handier in python.因此,我不是在网站中手动选择查询,而是在考虑如何在 python 中更方便地进行此操作。 To do so, I used
request
, http
to access the url and read the content, it is not intuitive for me how should I pass the queries then make a selection and download the data as csv
.为此,我使用
request
, http
访问 url 并阅读内容,这对我来说并不直观,我应该如何通过查询然后进行选择并将数据下载为csv
。 Does anyone knows of doing this easily in python?有谁知道在 python 中轻松做到这一点? Is there any workaround we could download the data from url with specific queries?
有什么解决方法可以通过特定查询从 url 下载数据? Any idea?
任何想法?
this is my current attempt这是我目前的尝试
here is the url that I am going to select data with custom queries.这是url ,我将使用自定义查询访问 select 数据。
import io
import requests
import pandas as pd
url="https://www.marketnews.usda.gov/mnp/ls-report-retail?&repType=summary&portal=ls&category=Retail&species=BEEF&startIndex=1"
s=requests.get(url).content
df=pd.read_csv(io.StringIO(s.decode('utf-8')))
so before reading the requested json in pandas
, I need to pass following queries for correct data selection:所以在阅读 pandas 中请求的
pandas
,我需要通过以下查询以正确选择数据:
Category = "Retail"
Report Type = "Item"
Species = "Beef"
Region(s) = "National"
Start Dates = "2020-01-01"
End Date = "2021-02-08"
it is not intuitive for me how should I pass the queries with requested json then download the filtered data as csv
.我应该如何通过请求的 json 传递查询,然后将过滤后的数据下载为
csv
,这对我来说并不直观。 Is there any efficient way of doing this in python?在 python 中是否有任何有效的方法可以做到这一点? Any thoughts?
有什么想法吗? Thanks
谢谢
A few details一些细节
requests(params=)
is a dict
. requests(params=)
是一个dict
。 Built it up and passed, no need to deal with building complete URL stringimport io
import requests
import pandas as pd
url="https://www.marketnews.usda.gov/mnp/ls-report-retail"
p = {"repType":"summary","species":"BEEF","portal":"ls","category":"Retail","format":"text"}
r = requests.get(url, params=p)
df = pd.read_csv(io.StringIO(r.text), sep="\s\s+", engine="python")
Date![]() |
Region![]() |
Feature Rate![]() |
Outlets![]() |
Special Rate![]() |
Activity Index![]() |
|
---|---|---|---|---|---|---|
0 ![]() |
02/05/2021 ![]() |
NATIONAL![]() |
69.40% ![]() |
29,200 ![]() |
20.10% ![]() |
81,650 ![]() |
1 ![]() |
02/05/2021 ![]() |
NORTHEAST![]() |
75.00% ![]() |
5,500 ![]() |
3.80% ![]() |
17,520 ![]() |
2 ![]() |
02/05/2021 ![]() |
SOUTHEAST![]() |
70.10% ![]() |
7,400 ![]() |
28.00% ![]() |
23,980 ![]() |
3 ![]() |
02/05/2021 ![]() |
MIDWEST![]() |
75.10% ![]() |
6,100 ![]() |
19.90% ![]() |
17,430 ![]() |
4 ![]() |
02/05/2021 ![]() |
SOUTH CENTRAL![]() |
57.90% ![]() |
4,900 ![]() |
26.40% ![]() |
9,720 ![]() |
5 ![]() |
02/05/2021 ![]() |
NORTHWEST![]() |
77.50% ![]() |
1,300 ![]() |
2.50% ![]() |
3,150 ![]() |
6 ![]() |
02/05/2021 ![]() |
SOUTHWEST![]() |
63.20% ![]() |
3,800 ![]() |
27.50% ![]() |
9,360 ![]() |
7 ![]() |
02/05/2021 ![]() |
ALASKA![]() |
87.00% ![]() |
200 ![]() |
.00% ![]() |
290 ![]() |
8 ![]() |
02/05/2021 ![]() |
HAWAII![]() |
46.70% ![]() |
100 ![]() |
.00% ![]() |
230 ![]() |
Just format the query data in the url - it's actually a REST API:只需格式化 url 中的查询数据 - 它实际上是 REST API:
To add more query data, as @mullinscr said, you can change the values on the left and press submit, then see the query name in the URL (for example, start date is called repDate
).要添加更多查询数据,正如@mullinscr 所说,您可以更改左侧的值并按提交,然后在 URL 中查看查询名称(例如,开始日期称为
repDate
)。
If you hover on the Download as XML link, you will also discover you can specify the download format using format=<format_name>
.如果您在下载为 XML 链接上使用 hover,您还会发现可以使用
format=<format_name>
指定下载格式。 Parsing the tabular data in XML using pandas might be easier, so I would append format=xml
at the end as well.使用 pandas 解析 XML 中的表格数据可能更容易,所以我也会在最后使用 append
format=xml
。
category = "Retail"
report_type = "Item"
species = "BEEF"
regions = "NATIONAL"
start_date = "01-01-2019"
end_date = "01-01-2021"
# the website changes "-" to "%2F"
start_date = start_date.replace("-", "%2F")
end_date = end_date.replace("-", "%2F")
url = f"https://www.marketnews.usda.gov/mnp/ls-report-retail?runReport=true&portal=ls&startIndex=1&category={category}&repType={report_type}&species={species}®ion={regions}&repDate={start_date}&endDate={end_date}&compareLy=No&format=xml"
# parse with pandas, etc...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.