简体   繁体   中英

problems with Scraping a Website with Elixir

I'm trying to get a simple hound test working with my app, I figured out its an error with selenium. This is the code:

In mix.exs:

 defmodule Scraper.Mixfile do use Mix.Project def project do [app: :scraper, version: "0.0.1", elixir: "~> 1.0", build_embedded: Mix.env == :prod, start_permanent: Mix.env == :prod, deps: deps] end # Configuration for the OTP application # # Type `mix help compile.app` for more information def application do [applications: [:logger, :httpoison, :hound]] end # Dependencies can be Hex packages: # # {:mydep, "~> 0.3.0"} # # Or git/path repositories: # # {:mydep, git: "https://github.com/elixir-lang/mydep.git", tag: "0.1.0"} # # Type `mix help deps` for more examples and options defp deps do [ {:httpoison, "~> 0.7"}, {:floki, "~> 0.7"}, {:hound, "~> 0.7"} ] end end 

In lib/scraper.ex

 defmodule Example do use Hound.Helpers def run do Hound.start_session IO.inspect "Iniciando" navigate_to "http://akash.im" IO.inspect page_title() Hound.end_session end end 

In config/config.exs

 # This file is responsible for configuring your application # and its dependencies with the aid of the Mix.Config module. use Mix.Config # This configuration is loaded before any dependency and is restricted # to this project. If another project depends on this project, this # file won't be loaded nor affect the parent project. For this reason, # if you want to provide default values for your application for third- # party users, it should be done in your mix.exs file. # Sample configuration: # # config :logger, :console, # level: :info, # format: "$date $time [$level] $metadata$message\\n", # metadata: [:user_id] # It is also possible to import configuration files, relative to this # directory. For example, you can emulate configuration per environment # by uncommenting the line below and defining dev.exs, test.exs and such. # Configuration from the imported file will override the ones defined # here (which is why it is important to import them last). # # import_config "#{Mix.env}.exs" # Define how long the application will wait between failed attempts (in miliseconds) config :hound, retry_time: 100000 # Start with selenium driver (default) config :hound, driver: "selenium" 

Starting a webdriver server

 java -jar selenium-server-standalone-2.45.0.jar 

Run app:

 /scraper$ iex -S mix Erlang/OTP 18 [erts-7.1] [source] [64-bit] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false] Interactive Elixir (1.0.5) - press Ctrl+C to exit (type h() ENTER for help) iex(1)> Example.run ** (exit) exited in: :gen_server.call(Hound.SessionServer, {:find_or_create_session, #PID<0.148.0>}, 60000) ** (EXIT) an exception was raised: ** (MatchError) no match of right hand side value: {:error, %HTTPoison.Error{id: nil, reason: :timeout}} (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4 (hound) lib/hound/session_server.ex:22: Hound.SessionServer.handle_call/3 (stdlib) gen_server.erl:629: :gen_server.try_handle_call/4 (stdlib) gen_server.erl:661: :gen_server.handle_msg/5 (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3 11:26:13.971 [error] GenServer Hound.SessionServer terminating Last message: {:find_or_create_session, #PID<0.148.0>} State: #HashDict<[]> ** (exit) an exception was raised: ** (MatchError) no match of right hand side value: {:error, %HTTPoison.Error{id: nil, reason: :timeout}} (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4 (hound) lib/hound/session_server.ex:22: Hound.SessionServer.handle_call/3 (stdlib) gen_server.erl:629: :gen_server.try_handle_call/4 (stdlib) gen_server.erl:661: :gen_server.handle_msg/5 (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3 (stdlib) gen_server.erl:212: :gen_server.call/3 (scraper) lib/scraper.ex:37: Example.run/0 iex(1)> 

The request timed out in this case, as can be seen from the line

** (MatchError) no match of right hand side value: {:error, %HTTPoison.Error{id: nil, reason: :timeout}}

If you look at the stack trace, it indicates the error is at

(hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4

And if you open up hound source, on line 43 of lib/hound/request_utils.ex you see

case type do
  :get ->
    {:ok, resp} = HTTPoison.get(url, headers, @http_options)
  :post ->
    {:ok, resp} = HTTPoison.post(url, body, headers, @http_options)
  :delete ->
    {:ok, resp} = HTTPoison.delete(url, headers, @http_options)
end

This code expects a response, and crashes otherwise. There's a timeout error in your case, causing the crash.

Please check if the website up and reachable when you run the test, and retry.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM