从Web下载Python下载csv文件

Question

Goal : want to automatize the download of various .csv files from http://www.tocom.or.jp/historical/download.html using Python (this is not the main issue though) 目标：希望使用Python自动化从http://www.tocom.or.jp/historical/download.html下载各种.csv文件（但这不是主要问题）

Specifics : in particular, I am trying to download the csv files for the "Tick Data" (fifth heading from the bottom, for the 5 days available. 细节：特别是，我正在尝试下载“Tick Data”的csv文件（从底部开始的第五个标题，可用的5天）。

Problem : when I see the source code for this web page, look for "Tick Data" I see the references to these 5 .csv files but they're not with the usual href tag. 问题：当我看到此网页的源代码时，查找“Tick Data”我看到对这5个.csv文件的引用，但它们没有通常的href标记。 As I am using Python (urllib) I need to know the URLs of these 5 .csv files but don't know how to get them. 当我使用Python（urllib）时，我需要知道这5个.csv文件的URL，但不知道如何获取它们。

This is not a question of Python per se, but about how to find the URL of some .csv that can be downloaded from a web page. 这不是Python本身的问题，而是关于如何找到可以从网页下载的某些.csv的URL。 Hence, no code is provided. 因此，没有提供代码。

Answer 1

The page uses JavaScript to create the URL: 该页面使用JavaScript来创建URL：

<select name="tick">
  <option value="TOCOMprice_20121122.csv">Nov 22, 2012</option>
  <option value="TOCOMprice_20121121.csv">Nov 21, 2012</option>
  <option value="TOCOMprice_20121120.csv">Nov 20, 2012</option>
  <option value="TOCOMprice_20121119.csv">Nov 19, 2012</option>
  <option value="TOCOMprice_20121116.csv">Nov 16, 2012</option>
</select>
  <input type="button" onClick="location.href='/data/tick/' + document.form.tick.value;" 
        value="Download" style="width:7em;" />

It combines a path, that the browser will use against the current site. 它结合了浏览器将使用的路径与当前站点。 So each URL is: 所以每个URL是：

http://www.tocom.or.jp + /data/tick/ + TOCOMprice_*yearmonthday*.csv

By the looks of it, the data only covers weekdays. 从外观上看，数据仅涵盖工作日。

These are easy enough to cobble together into automated URLs: 这些很容易拼凑成自动化的URL：

import requests
from datetime import datetime, timedelta

start = datetime.now() - timedelta(days=1)
base = 'http://www.tocom.or.jp/data/tick/TOCOMprice_'

next = start
for i in range(5):
    r = requests.get(base + next.strftime('%Y%m%d') + '.csv')
    # Save r.content somewhere
    next += timedelta(days=1)
    while next.weekday() >= 5:  # Sat = 5, Sun = 6
        next += timedelta(days=1)

I used requests for it's easier-to-use API, but you can use urllib2 for this task too if you so wish. 我使用了更容易使用的API requests ，但如果您愿意，也可以使用urllib2来完成此任务。

Answer 2

Use Chrome w/Dev Tools, Firefox w/Firebug or Fiddler to look at the request URL when you hit the download button. 点击下载按钮，使用Chrome w / Dev Tools，Firefox w / Firebug或Fiddler查看请求URL。

(for example, I see this for Nov 22: http://www.tocom.or.jp/data/tick/TOCOMprice_20121122.csv ) （例如，我在11月22日看到这个： http ： //www.tocom.or.jp/data/tick/TOCOMprice_20121122.csv ）

Answer 3

You can determine the download link Using the developer menu of your browser, among other things. 您可以确定下载链接使用浏览器的开发人员菜单等。 I use chrome, and I'm shown that the link is 我使用chrome，我发现链接是

http://www.tocom.or.jp/data/souba_d/souba_d_20121126_20121123_0425.csv http://www.tocom.or.jp/data/souba_d/souba_d_20121126_20121123_0425.csv

That URL structure seems pretty straight forward to guess, and another link right on the page: 该URL结构似乎很容易猜到，并且页面上有另一个链接：

http://www.tocom.or.jp/historical/keishiki_souba_d.html http://www.tocom.or.jp/historical/keishiki_souba_d.html

Indicates how to structure pulls. 指示如何构造拉力。 A good bet is just to structure the csv pulls in 5 minute intervals. 一个好的选择就是以5分钟的间隔构建csv拉力。

Good luck! 祝好运！

从Web下载Python下载csv文件

问题描述

3 个解决方案

解决方案1
3 2012-11-23 21:55:00

解决方案2
1 2012-11-23 21:54:50

解决方案3
0 2012-11-23 21:59:35

从Web下载Python下载csv文件

问题描述

3 个解决方案

解决方案1 3 2012-11-23 21:55:00

解决方案2 1 2012-11-23 21:54:50

解决方案3 0 2012-11-23 21:59:35

解决方案1
3 2012-11-23 21:55:00

解决方案2
1 2012-11-23 21:54:50

解决方案3
0 2012-11-23 21:59:35