I tried to use Erlang's httpc module for high concurrent requests
My code for many requests in spawn hasn't worked:
-module(t).
-compile(export_all).
start() ->
ssl:start(),
inets:start( httpc, [{profile, default}] ),
httpc:set_options([{max_sessions, 200}, {pipeline_timeout, 20000}], default),
{ok, Device} = file:open("c:\urls.txt", read),
read_each_line(Device).
read_each_line(Device) ->
case io:get_line(Device, "") of
eof -> file:close(Device);
Line -> go( string:substr(Line, 1,length(Line)-1)),
read_each_line(Device)
end.
go(Url)->
spawn(t,geturl, [Url] ).
geturl(Url)->
UrlHTTP=lists:concat(["http://www.", Url]),
io:format(UrlHTTP),io:format("~n"),
{ok, RequestId}=httpc:request(get,{UrlHTTP,[{"User-Agent", "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"}]}, [],[{sync, false}]),
receive
{http, {RequestId, {_HttpOk, _ResponseHeaders, Body}}} -> io:format("ok"),ok
end.
httpc:request is not received in html Body - if I can use spawn in
go(Url)->
spawn(t,geturl, [Url] ).
http://erlang.org/doc/man/httpc.html
Note
If possible the client will keep its connections alive and use persistent connections with or without pipeline depending on configuration and current circumstances. The HTTP/1.1 specification does not provide a guideline for how many requests would be ideal to be sent on a persistent connection, this very much depends on the application. Note that a very long queue of requests may cause a user perceived delay as earlier requests may take a long time to complete. The HTTP/1.1 specification does suggest a limit of 2 persistent connections per server, which is the default value of the max_sessions option
urls.txt contain different urls - for example
google.com
amazon.com
alibaba.com
...
What's wrong?
Your code never actually starts the httpc
service (and inets
, the application that it depends on), and the confusion probably comes from the unfortunate overloading of the inets:start/[0,1,2,3]
function:
inets:start/[0,1]
starts the inets
application itself and the httpc
service with the default profile (called default
).
inets:start/[2,3]
(which should be called start_service
) starts one of the services that can run atop inets
(viz. ftpc
, tftp
, httpc
, httpd
) once the inets
application has already started .
start() ->
start(Type) -> ok | {error, Reason}
Starts the Inets application.
start(Service, ServiceConfig) -> {ok, Pid} | {error, Reason}
start(Service, ServiceConfig, How) -> {ok, Pid} | {error, Reason}
Dynamically starts an Inets service after the Inets application has been started
(withinets:start/[0,1]
).
So your spawned process simply crashed when trying to call httpc:request/4
as the service itself was not running. To illustrate, inets:start( httpc, [{profile, default}] )
from your start/0
function would fail to start inets
and the httpc
service:
Eshell V10.7 (abort with ^G)
1> inets:start(httpc, [{profile, default}]).
{error,inets_not_started}
You should check the returned value of application start to track potential problems:
...
ok = ssl:start(),
ok = inets:start(),
...
Or, if the application could be already started, use a function like this:
...
ok = ensure_start(ssl),
ok = ensure_start(inets),
...
ensure_start(M) ->
case M:start() of
ok -> ok;
{error,{already_started,M}} -> ok;
Other -> Other
end.
[edit 2 - small code enhancement]
I have tested this code and it works on my PC. Note that you are using a '' in the string for file access, this is an escape sequence that make the line to fail.
-module(t).
-compile(export_all).
start() -> start(2000).
% `To` is a parameter which is passed to `getUrl`
% to change the timeout value. You can play with
% it to see the request queue effect, and how
% much the response times of each site varies.
%
% The default timeout value is set to 2 seconds.
start(To) ->
ok = ensure_start(ssl),
ok = ensure_start(inets),
ok = httpc:set_options([{max_sessions, 200}, {pipeline_timeout, 20000}], default),
{ok, Device} = file:open("D:/urls.txt", read),
read_each_line(Device,To).
read_each_line(Device,To) ->
case io:get_line(Device, "") of
eof -> file:close(Device);
Line -> go( string:substr(Line, 1,length(Line)-1),To),
read_each_line(Device,To)
end.
go(Url,To)->
spawn(t,geturl, [Url,To] ).
geturl(Url,To)->
UrlHTTP=lists:concat(["http://www.", Url]),
io:format(UrlHTTP), io:format("~n"),
{ok, RequestId}=httpc:request(get,{UrlHTTP,[{"User-Agent", "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"}]}, [],[{sync, false}]),
M = receive
{http, {RequestId, {_HttpOk, _ResponseHeaders, _Body}}} -> ok
after To ->
not_ok
end,
io:format("httprequest to ~p: ~p~n",[UrlHTTP,M]).
ensure_start(M) ->
case M:start() of
ok -> ok;
{error,{already_started,M}} -> ok;
Other -> Other
end.
and in the console:
1> t:start().
http://www.povray.org
http://www.google.com
http://www.yahoo.com
ok
httprequest to "http://www.google.com": ok
httprequest to "http://www.povray.org": ok
httprequest to "http://www.yahoo.com": ok
2> t:start().
http://www.povray.org
http://www.google.com
http://www.yahoo.com
ok
httprequest to "http://www.google.com": ok
httprequest to "http://www.povray.org": ok
httprequest to "http://www.yahoo.com": ok
3>
Note that thanks to the ensure_start/1
you can launch the application twice.
I have tested also with a bad url and it is detected.
My test include only 3 urls, and I guess that if there are many urls, the time to get the response will increase, because the loop to spawn processes is faster to execute than the request themselves. So you must expect at some point some timeout issue. There may be also some limitation in the http client, I didn't check the doc for this particular point.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.