将weburl的内容索引到elasticsearch / kibana中

Question

I have scrapped 500+ links/sublinks of a website using beautiful soup+python,now I am looking forward to index all the contents/text of this url in elasticsearch,is there any tool that can help me indexing directly with elastic search/kibana stack. 我已经使用漂亮的汤+ python废弃了一个网站的500多个链接/子链接，现在我期待在Elasticsearch中对该URL的所有内容/文本建立索引，是否有任何工具可以帮助我直接在Elastic Search / Kibana中建立索引堆。

please help me with pointers,i tried searching on google and found logstash,but seems it works for single url. 请帮助我的指针，我试图在谷歌搜索并发现logstash，但似乎它适用于单个URL。

Answer 1

For reference on Logstash please see: https://www.elastic.co/guide/en/logstash/current/getting-started-with-logstash.html 有关Logstash的参考，请参阅： https ://www.elastic.co/guide/en/logstash/current/getting-started-with-logstash.html

Otherwise, an example of putting your crawler output into a file, with a line per url, you could have the logstash config below, in this example, logstash will read one line as being a message and send it to the elastic servers on host1 and host2. 否则，在将爬虫程序输出放入文件（每个网址一行）的示例中，您可以在下面进行logstash配置，在此示例中，logstash将读取一行作为消息，并将其发送到host1上的弹性服务器，然后主机2。

input {
    file {
        path => "/an/absolute/path" #The path has to be absolute
        start_position => beginning
     }
}

output {
    elasticsearch{
        hosts => ["host1:port1", "host2:port2"] #most of the time the host being the DNS name (localhost as the most basic one), the port is 9200
        index => "my_crawler_urls"
        workers => 4 #to define depending on your available resources/expected performance
    }
}

Now of course, you might want to do some filter, post-treatment of the output of your crawler, and for that Logstash gives you the possibility with codecs and/or filters 当然，现在，您可能需要做一些过滤器，对爬虫的输出进行后处理，为此，Logstash使您可以使用编解码器和/或过滤器

将weburl的内容索引到elasticsearch / kibana中

问题描述

1 个解决方案

解决方案1
0 2017-03-07 12:31:33

将weburl的内容索引到elasticsearch / kibana中

问题描述

1 个解决方案

解决方案1 0 2017-03-07 12:31:33

解决方案1
0 2017-03-07 12:31:33