简体   繁体   中英

httpc + many request in spawn not worked

I tried to use Erlang's httpc module for high concurrent requests

My code for many requests in spawn hasn't worked:

-module(t).
-compile(export_all).

start() ->
  ssl:start(),
  inets:start( httpc, [{profile, default}] ),
  httpc:set_options([{max_sessions, 200}, {pipeline_timeout, 20000}], default),

  {ok, Device} = file:open("c:\urls.txt", read),
  read_each_line(Device).

read_each_line(Device) ->
  case io:get_line(Device, "") of
    eof  -> file:close(Device);
    Line -> go( string:substr(Line, 1,length(Line)-1)),
      read_each_line(Device)
  end.

go(Url)->
  spawn(t,geturl, [Url] ).

geturl(Url)->
  UrlHTTP=lists:concat(["http://www.",  Url]),
  io:format(UrlHTTP),io:format("~n"),

  {ok, RequestId}=httpc:request(get,{UrlHTTP,[{"User-Agent", "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"}]}, [],[{sync, false}]),

  receive
    {http, {RequestId, {_HttpOk, _ResponseHeaders, Body}}} -> io:format("ok"),ok
  end.

httpc:request is not received in html Body - if I can use spawn in

go(Url)->
      spawn(t,geturl, [Url] ).

http://erlang.org/doc/man/httpc.html

Note

If possible the client will keep its connections alive and use persistent connections with or without pipeline depending on configuration and current circumstances. The HTTP/1.1 specification does not provide a guideline for how many requests would be ideal to be sent on a persistent connection, this very much depends on the application. Note that a very long queue of requests may cause a user perceived delay as earlier requests may take a long time to complete. The HTTP/1.1 specification does suggest a limit of 2 persistent connections per server, which is the default value of the max_sessions option

urls.txt contain different urls - for example

google.com
amazon.com
alibaba.com
...

What's wrong?

Your code never actually starts the httpc service (and inets , the application that it depends on), and the confusion probably comes from the unfortunate overloading of the inets:start/[0,1,2,3] function:

  • inets:start/[0,1] starts the inets application itself and the httpc service with the default profile (called default ).

  • inets:start/[2,3] (which should be called start_service ) starts one of the services that can run atop inets (viz. ftpc , tftp , httpc , httpd ) once the inets application has already started .

start() ->
start(Type) -> ok | {error, Reason}

Starts the Inets application.

start(Service, ServiceConfig) -> {ok, Pid} | {error, Reason}
start(Service, ServiceConfig, How) -> {ok, Pid} | {error, Reason}

Dynamically starts an Inets service after the Inets application has been started
(with inets:start/[0,1] ).

So your spawned process simply crashed when trying to call httpc:request/4 as the service itself was not running. To illustrate, inets:start( httpc, [{profile, default}] ) from your start/0 function would fail to start inets and the httpc service:

Eshell V10.7  (abort with ^G)                                                                                                            
1> inets:start(httpc, [{profile, default}]).
{error,inets_not_started}

You should check the returned value of application start to track potential problems:

...
ok = ssl:start(),
ok = inets:start(),
...

Or, if the application could be already started, use a function like this:

...
ok = ensure_start(ssl),
ok = ensure_start(inets),
...
ensure_start(M) ->
    case M:start() of
        ok -> ok;
        {error,{already_started,M}} -> ok;
        Other -> Other
    end.

[edit 2 - small code enhancement]

I have tested this code and it works on my PC. Note that you are using a '' in the string for file access, this is an escape sequence that make the line to fail.

-module(t).
-compile(export_all).

start() -> start(2000).

% `To` is a parameter which is passed to `getUrl`
% to change the timeout value. You can  play with 
% it  to  see  the  request queue effect, and how
% much the response times of each site varies.
%
% The default timeout value is set to 2 seconds.
start(To) ->
  ok = ensure_start(ssl),
  ok = ensure_start(inets),
  ok = httpc:set_options([{max_sessions, 200}, {pipeline_timeout, 20000}], default),

  {ok, Device} = file:open("D:/urls.txt", read),
  read_each_line(Device,To).

read_each_line(Device,To) ->
  case io:get_line(Device, "") of
    eof  -> file:close(Device);
    Line -> go( string:substr(Line, 1,length(Line)-1),To),
      read_each_line(Device,To)
  end.

go(Url,To)->
  spawn(t,geturl, [Url,To] ).

geturl(Url,To)->
  UrlHTTP=lists:concat(["http://www.",  Url]),
  io:format(UrlHTTP), io:format("~n"),

  {ok, RequestId}=httpc:request(get,{UrlHTTP,[{"User-Agent", "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"}]}, [],[{sync, false}]),

  M = receive
        {http, {RequestId, {_HttpOk, _ResponseHeaders, _Body}}} -> ok
      after To ->
        not_ok
      end,
  io:format("httprequest to ~p: ~p~n",[UrlHTTP,M]).

  ensure_start(M) ->
    case M:start() of
        ok -> ok;
        {error,{already_started,M}} -> ok;
        Other -> Other
    end.

and in the console:

1> t:start().
http://www.povray.org
http://www.google.com
http://www.yahoo.com
ok
httprequest to "http://www.google.com": ok
httprequest to "http://www.povray.org": ok
httprequest to "http://www.yahoo.com": ok
2> t:start().
http://www.povray.org
http://www.google.com
http://www.yahoo.com
ok
httprequest to "http://www.google.com": ok
httprequest to "http://www.povray.org": ok
httprequest to "http://www.yahoo.com": ok
3> 

Note that thanks to the ensure_start/1 you can launch the application twice.

I have tested also with a bad url and it is detected.

My test include only 3 urls, and I guess that if there are many urls, the time to get the response will increase, because the loop to spawn processes is faster to execute than the request themselves. So you must expect at some point some timeout issue. There may be also some limitation in the http client, I didn't check the doc for this particular point.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM