将Wget与错误的URL结合使用

Question

I've got the following link, which is downloading a CSV file when put through a web browser. 我有以下链接，当通过网络浏览器放置时，该链接正在下载CSV文件。

http://pro.allocine.fr/film/export_classement.html?typeaffichage=2&lsttype=1001&lsttypeperiode=3002&typedonnees=visites&cfilm=&datefiltre=

However, when using Wget with Cygwin, with the command below, Wget retrieves a file, which is not a CSV file, but a file without extension. 但是，将Wget与Cygwin一起使用时，通过以下命令，Wget会检索一个文件，该文件不是CSV文件，而是没有扩展名的文件。 The file is empty, that is, has no data at all. 该文件为空，即完全没有数据。

wget 'http://pro.allocine.fr/film/export_classement.html?typeaffichage=2&lsttype=1001&lsttypeperiode=3002&typedonnees=visites&cfilm=&datefiltre='

So as I hate to be stuck, I tried the following as well. 因此，由于我不想被卡住，我也尝试了以下方法。 I put the URL in a text file and used Wget with the file option: 我将URL放在一个文本文件中，并将Wget与file选项一起使用：

inside fic.txt 在fic.txt中

'http://pro.allocine.fr/film/export_classement.html?typeaffichage=2&lsttype=1001&lsttypeperiode=3002&typedonnees=visites&cfilm=&datefiltre='

I used Wget in the following way: 我以以下方式使用Wget：

wget -i fic.txt

I got the following errors: 我收到以下错误：

 Scheme missing
 No URLs found in toto.txt

Answer 1

I think I can suggest some other options that will make your underlying problem more clear which is that it's supposed to be html, but there is no content (content-length = 0). 我想我可以建议其他一些选择，这些选择可以使您的根本问题更加清楚，那就是它应该是html，但是没有内容（content-length = 0）。

More concretely, this 更具体地说，

wget -S -O export_classement.html 'http://pro.allocine.fr/film/export_classement.html?typeaffichage=2&lsttype=1001&lsttypeperiode=3002&typedonnees=visites&cfilm=&datefiltre='

produces this 产生这个

Resolving pro.allocine.fr... 62.39.143.50
Connecting to pro.allocine.fr|62.39.143.50|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Server: nginx
  Date: Fri, 28 Mar 2014 09:54:44 GMT
  Content-Type: text/html; Charset=iso-8859-1
  Connection: close
  X-ServerName: WEBNX2
  akamainocache: no-store
  Content-Length: 0
  Cache-control: private
  X-KompressorName: kompressor7
Length: 0 [text/html]

2014-03-28 05:54:52 (0.00 B/s) - ‘export_classement.html’ saved [0/0]

Additionally the server is tailoring it's output based on how the browser identifies itself. 此外，服务器会根据浏览器的身份来调整其输出。 using wget does have an option to include an arbitrary user-agent in the headers. 使用wget确实可以选择在标头中包含任意用户代理。 Here's an example what happens when you make wget identify itself as Chrome. 这是一个示例，当您使wget将自己标识为Chrome时会发生什么。 Here's a list of other possibiities . 这是其他可能性的清单。

wget -S --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36" 'http://pro.allocine.fr/film/export_classement.html?typeaffichage=2&lsttype=1001‌&lsttypeperiode=3002&typedonnees=visites&cfilm=&datefiltre='

Now the output changes to export.csv, with type "application/octet-stream" instead of "text/html" 现在，输出更改为export.csv，类型为“ application / octet-stream”，而不是“ text / html”

HTTP request sent, awaiting response... 
 HTTP/1.1 200 OK
 Server: nginx
 Date: Fri, 28 Mar 2014 10:34:09 GMT
 Content-Type: application/octet-stream; Charset=iso-8859-1
 Transfer-Encoding: chunked
 Connection: close
 X-ServerName: WEBNX2
 Edge-Control: no-store
 Last-Modified: Fri, 28 Mar 2014 10:34:17 GMT
 Content-Disposition: attachment; filename=export.csv

将Wget与错误的URL结合使用

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-03-28 10:04:04

将Wget与错误的URL结合使用

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-03-28 10:04:04

解决方案1
2 已采纳 2014-03-28 10:04:04