简体   繁体   English

我应该使用带有simplexml_load_file和file_get_contents的代理吗?

[英]Should I use proxies with simplexml_load_file and file_get_contents?

I'm using simplexml_load_file to get RSS from several websites for a while. 我正在使用simplexml_load_file一段时间从多个网站获取RSS。

Sometimes I get errors from some of these websites and for about 5 days I'm having errors from 2 specific websites. 有时我从其中一些网站中收到错误,并且大约5天后,我在2个特定网站中出现了错误。

Here are the errors from simplexml_load_file : 这是来自simplexml_load_file的错误:

PHP Warning:  simplexml_load_file(http://example.com/feed): failed to open stream: Connection timed out 

PHP Warning:  simplexml_load_file(): I/O warning : failed to load external entity "http://example.com/feed" 

Here are the errors from file_get_contents : 这是file_get_contents中的错误:

PHP Warning:  file_get_contents(http://example.com/page): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden

That's how I'm using simplexml_load_file : 这就是我使用simplexml_load_file

simplexml_load_file( $url );

That's how I'm using file_get_contents : 这就是我使用file_get_contents

file_get_contents( $url );

Is that because I'm not using a proxy or invalid arguments? 那是因为我没有使用代理或无效的参数吗?

UPDATE: The 2 websites are using something like a firewall or a service to check for robots: 更新:这两个网站正在使用诸如防火墙或服务之类的东西来检查机器人:

Accessing http://example.com/feed securely…
This is an automatic process. Your browser will redirect to your requested content in 5 seconds.

You're relying on an assumption that http://example.com/feed is always going to exist and always return exactly the content you're looking for. 您所依据的假设是http://example.com/feed 总是存在并且始终准确返回您要查找的内容。 As you've discovered, this is a bad assumption. 如您所知,这是一个错误的假设。

You're attempting to access the network with your file_get_contents() and simplexml_load_file() and finding out that sometimes those call fail. 您正在尝试使用file_get_contents()simplexml_load_file()访问网络,并发现有时这些调用会失败。 You must always plan for these calls to fail. 您必须始终计划使这些调用失败。 It doesn't matter if some websites openly allow this kind of behavior or if you have very reliable web host. 某些网站是否公开允许这种行为或您是否拥有非常可靠的Web主机都没有关系。 There are circumstances out of your control, such as an Internet backbone outage, that will eventually cause your application to get back a bad response. 在某些情况下,您无法控制,例如Internet主干网中断,最终将导致您的应用程序获得不良响应。 In your situation, the third party has blocked you. 在您的情况下,第三方已阻止您。 This is one of the failures that happen with network requests. 这是网络请求发生的故障之一。

The first take away is that you must handle the failure better . 首先要解决的是您必须更好地处理故障 You cannot do this with file_get_contents() because file_get_contents() was designed to get the contents of files. 您不能使用file_get_contents()来执行此操作,因为file_get_contents()旨在获取文件的内容。 In my opinion the PHP implementers that allowed it to make network calls made a very serious mistake allowing it this functionality. 在我看来,允许它进行网络调用的PHP实现者犯了一个非常严重的错误,即允许它执行此功能。 I'd recommend using curl: 我建议使用curl:

function doRequest($url) {
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
    curl_setopt($ch, CURLOPT_TIMEOUT,10);
    $output = curl_exec($ch);
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if () {
        return $output;
    } else {
        throw new Exception('Sorry, an error occurred');
    }
}

Using this you will be able to handle errors (they will happen) better for your own users. 使用此功能,您将能够为自己的用户更好地处理错误(错误将会发生)。

You're second problem is that this specific host is giving you a 403 error. 您的第二个问题是此特定主机给您403错误。 This is probably intentional on their end. 可能是他们故意这样做的。 I would assume that this is them telling you that they don't want you using their website like this. 我认为这是他们告诉您,他们不希望您使用这样的网站。 However you will need to engage them specifically and ask them what you can do. 但是,您将需要特别吸引他们,并询问他们可以做什么。 They might ask you to use a real API, they might just ignore you entirely, they might even tell you to pound sand - but there isn't anything that we can do to advise here. 他们可能会要求您使用真实的API,他们可能会完全忽略您,甚至可能告诉您要砸沙子-但我们在这里没有任何建议可做。 This is strictly a problem (or feature) with their software and you must contact them directly for advice. 严格来说,这是他们软件的问题(或功能),您必须直接与他们联系以寻求建议。

You could potentially use multiple IP addresses to connect to websites and rotate IPs each time one gets blocked. 您可能会使用多个IP地址连接到网站,并在每次被阻止时轮换IP。 But doing so would be considered a malicious attack on their service. 但是,这样做将被视为对其服务的恶意攻击。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM