繁体   English   中英

Elixir刮取网站的问题

[英]problems with Scraping a Website with Elixir

我正在尝试使用我的应用程序进行简单的猎犬测试,发现硒存在错误。 这是代码:

在mix.exs中:

 defmodule Scraper.Mixfile do use Mix.Project def project do [app: :scraper, version: "0.0.1", elixir: "~> 1.0", build_embedded: Mix.env == :prod, start_permanent: Mix.env == :prod, deps: deps] end # Configuration for the OTP application # # Type `mix help compile.app` for more information def application do [applications: [:logger, :httpoison, :hound]] end # Dependencies can be Hex packages: # # {:mydep, "~> 0.3.0"} # # Or git/path repositories: # # {:mydep, git: "https://github.com/elixir-lang/mydep.git", tag: "0.1.0"} # # Type `mix help deps` for more examples and options defp deps do [ {:httpoison, "~> 0.7"}, {:floki, "~> 0.7"}, {:hound, "~> 0.7"} ] end end 

在lib / scraper.ex中

 defmodule Example do use Hound.Helpers def run do Hound.start_session IO.inspect "Iniciando" navigate_to "http://akash.im" IO.inspect page_title() Hound.end_session end end 

在config / config.exs中

 # This file is responsible for configuring your application # and its dependencies with the aid of the Mix.Config module. use Mix.Config # This configuration is loaded before any dependency and is restricted # to this project. If another project depends on this project, this # file won't be loaded nor affect the parent project. For this reason, # if you want to provide default values for your application for third- # party users, it should be done in your mix.exs file. # Sample configuration: # # config :logger, :console, # level: :info, # format: "$date $time [$level] $metadata$message\\n", # metadata: [:user_id] # It is also possible to import configuration files, relative to this # directory. For example, you can emulate configuration per environment # by uncommenting the line below and defining dev.exs, test.exs and such. # Configuration from the imported file will override the ones defined # here (which is why it is important to import them last). # # import_config "#{Mix.env}.exs" # Define how long the application will wait between failed attempts (in miliseconds) config :hound, retry_time: 100000 # Start with selenium driver (default) config :hound, driver: "selenium" 

启动Webdriver服务器

 java -jar selenium-server-standalone-2.45.0.jar 

运行应用程序:

 /scraper$ iex -S mix Erlang/OTP 18 [erts-7.1] [source] [64-bit] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false] Interactive Elixir (1.0.5) - press Ctrl+C to exit (type h() ENTER for help) iex(1)> Example.run ** (exit) exited in: :gen_server.call(Hound.SessionServer, {:find_or_create_session, #PID<0.148.0>}, 60000) ** (EXIT) an exception was raised: ** (MatchError) no match of right hand side value: {:error, %HTTPoison.Error{id: nil, reason: :timeout}} (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4 (hound) lib/hound/session_server.ex:22: Hound.SessionServer.handle_call/3 (stdlib) gen_server.erl:629: :gen_server.try_handle_call/4 (stdlib) gen_server.erl:661: :gen_server.handle_msg/5 (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3 11:26:13.971 [error] GenServer Hound.SessionServer terminating Last message: {:find_or_create_session, #PID<0.148.0>} State: #HashDict<[]> ** (exit) an exception was raised: ** (MatchError) no match of right hand side value: {:error, %HTTPoison.Error{id: nil, reason: :timeout}} (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4 (hound) lib/hound/session_server.ex:22: Hound.SessionServer.handle_call/3 (stdlib) gen_server.erl:629: :gen_server.try_handle_call/4 (stdlib) gen_server.erl:661: :gen_server.handle_msg/5 (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3 (stdlib) gen_server.erl:212: :gen_server.call/3 (scraper) lib/scraper.ex:37: Example.run/0 iex(1)> 

从这种情况可以看出,请求在这种情况下超时了

** (MatchError) no match of right hand side value: {:error, %HTTPoison.Error{id: nil, reason: :timeout}}

如果查看堆栈跟踪,则表明错误在

(hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4

如果您打开猎犬源,那么在lib/hound/request_utils.ex第43行,您会看到

case type do
  :get ->
    {:ok, resp} = HTTPoison.get(url, headers, @http_options)
  :post ->
    {:ok, resp} = HTTPoison.post(url, body, headers, @http_options)
  :delete ->
    {:ok, resp} = HTTPoison.delete(url, headers, @http_options)
end

此代码需要响应,否则将崩溃。 您的情况下存在超时错误,导致崩溃。

运行测试时,请检查网站是否可以正常访问,然后重试。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM