简体   繁体   English

如何将Element从Jsoup保存到数据库

[英]How to save Element from Jsoup to database

I use Jsoup get all data from website and save element if match some content when i get. 我使用Jsoup从网站获取所有数据,并在我获取匹配的内容时保存元素。 I want when we get element. 我想要何时获得元素。 If it match some thing character , I save element from database(MYSQL,Postgress...). 如果匹配某些东西字符,我从数据库中保存元素(MYSQL,Postgress ...)。 I code look like : 我的代码看起来像:

Connection conn = Jsoup.connect("https://viblo.asia");
                Document doc = conn.userAgent("Mozilla").get();
                Elements elements = doc.getElementsByClass("post-feed").get(0).children();
                Elements list = new Elements();
                Elements strings = new Elements();
                for (Element element : elements) {
                    if (element.hasClass("post-feed-item")) {
                        list.add(element);
                        Element e = element.children().get(1).children().get(1).children().get(0);
                        if (e.text().matches("^.*?(Docker|docker|DOCKER).*$")) {
                            strings.add(e);
                            //save to element to DB
                        }

                    }
                }

                for (Element page : elements) {
                    if (links.add(URL)) {
                        //Remove the comment from the line below if you want to see it running on your editor
                        System.out.println(URL);
                    }
                    getPageLinks(page.attr("abs:href"));
                }

I want if title from element contain : "Docker" it save my element to Database. 我想如果元素的标题包含:“ Docker”,则将我的元素保存到数据库。 But in element, It contain div and some thing link url, img , content. 但在元素中,它包含div和一些东西链接url,img,内容。 How to i save it to database. 我如何将其保存到数据库。 What if I want to save each element in a field in a database that is feasible? 如果我想将每个元素保存在可行的数据库字段中怎么办? If not I can convert element to html and save it? 如果没有,我可以将元素转换为html并保存吗? Please help. 请帮忙。

Example html i want save data base: 我想要保存数据库的示例html:

<div class="post-feed-item">
 <a href="/u/HoanKi"><img src="https://images.viblo.asia/avatar/1d0e5458-ad41-4d1c-89db-292dc198b4fa.png" srcset="https://images.viblo.asia/avatar/1d0e5458-ad41-4d1c-89db-292dc198b4fa.png 1x, https://images.viblo.asia/avatar-retina/1d0e5458-ad41-4d1c-89db-292dc198b4fa.png 2x" class="avatar avatar--md mr-05"></a>
 <div class="post-feed-item__info">
  <div class="post-meta--inline">
   <div class="user--inline d-inline-flex">
    <!---->
    <a href="/u/HoanKi" class="mr-05">Hoàn Kì</a>
    <!---->
   </div>
   <div class="post-meta d-inline-flex align-items-center flex-wrap">
    <div class="text-muted mr-05">
     <span class="mr-05">about 3 hours ago</span>
     <button title="Copy URL" class="icon-btn _13z_mK0hRyRB3dPzawysKe_0"><i aria-hidden="true" class="fa fa-link"></i></button>
    </div>
    <!---->
    <!---->
   </div>
  </div>
  <div class="post-title--inline">
   <h3 class="word-break mr-05"><a href="/p/docker-chua-biet-gi-den-biet-dung-phan-3-docker-compose-3P0lPm6p5ox" class="link">Docker: Chưa biết gì đến biết dùng (Phần 3 docker-compose )</a></h3>
   <div class="tags" data-v-cbe11868>
    <a href="/tags/docker" class="el-tag _3wKNDsArij9ZFjXe8k4ryR_0 el-tag--info el-tag--mini" data-v-cbe11868>Docker</a>
   </div>
  </div>
  <!---->
  <div class="d-flex justify-content-between">
   <div class="d-flex">
    <div class="stats">
     <span title="Views" class="stats-item text-muted"><i aria-hidden="true" class="stats-item__icon fa fa-eye"></i> 62 </span>
     <span title="Clips" class="stats-item text-muted"><i aria-hidden="true" class="stats-item__icon fa fa-paperclip"></i> 1 </span>
     <span title="Comments" class="stats-item text-muted"><i aria-hidden="true" class="stats-item__icon fa fa-comments"></i> 0 </span>
    </div>
    <!---->
   </div>
   <div title="Score" class="points">
    <div class="carets">
     <i aria-hidden="true" class="fa fa-caret-up"></i>
     <i aria-hidden="true" class="fa fa-caret-down"></i>
    </div>
    <span class="text-muted">4</span>
   </div>
  </div>
 </div>
</div>

First, modify your logic for fetching post-feed-item like this- 首先,修改您的获取post-feed-item逻辑,如下所示:

Connection conn = Jsoup.connect("https://viblo.asia");
Document doc = conn.userAgent("Mozilla").get();

Elements elements = doc.getElementsByClass("post-feed-item"); //This will get the whole element.

for (Element element : elements) {
    String postFeeds = "";

    if (element.toString().contains("docker")) {
        postFeeds = postFeeds.concat(element.toString());  
        //save postFeeds to DB
    }
}

Extra 额外

/**
 * Your parsed element may contain single quote ('). 
 * This will cause error while persisting.
 * to avoid this you need to escape single quote (')
 * with double single quote ('')
 */

 if (element.toString().contains("docker")) {
     postFeeds = postFeeds.concat(element.toString().replaceAll("'", "''"));  
     //save postFeeds to DB
 }

Second, What if I want to save each element in a field in a database that is feasible? 其次, 如果我想将每个元素保存在可行的数据库中的字段中怎么办?

You don't need separate columns to store each element at the database. 您不需要单独的列即可将每个元素存储在数据库中。 However you can save but the feasibility depends on your use case. 但是,您可以保存,但可行性取决于您的用例。 If you just want to store the post-feed-items only for writing it back to your web page then it is not feasible. 如果您只想存储post-feed-items以将其写回到您的网页,则不可行。

Third, How can I convert element to html and save? 第三, 如何将元素转换为html并保存?

You don't need to convert the element to html but you need to convert the element to String if you want to save it the database. 您不需要将element转换为html但如果要将其保存到数据库,则需要将element转换为String
All you need is a column type of BLOB data type (you can also save it as VARCHAR but BLOB is safer). 您需要的只是BLOB数据类型的列类型(您也可以将其保存为VARCHAR,BLOB更安全)。

Update 更新资料

How can I traverse all pages? 如何遍历所有页面?

By looking at the source code of that page I found this is how you can get the total page number - 通过查看该页面的源代码,我发现这是获取总页码的方法-

Elements pagination = doc.getElementsByAttributeValueMatching("href", "page=\\d");

int totalPageNo = Integer.parseInt(pagination.get(pagination.size() - 2).text());

then loop through each page. 然后循环浏览每个页面。

for(int page = 1; page <= totalPageNo; page++) {
    Connection conn = Jsoup.connect("https://viblo.asia/?page=" + page);
    //rest of your code
}

I properly know what's your mean.Here are some views: 我完全知道您的意思。以下是一些观点:
First you should clearify what`s your search for and make fields of tables in database. 首先,您应该清除搜索内容,并在数据库中创建表的字段。 Such as according your ideas, you can make a table_docker table in db and there are field_id,field_content,field_start_time,field_links and so on in it. 根据您的想法,您可以在db中创建一个table_docker表,其中包含field_id,field_content,field_start_time,field_links等。
Second you should code some utils of classes such as JsoupUtils which is get HTML and parse it , HtmlUtils which is used to handle the html remarks and download these pictures,DBUtils which is used to connect db and save data,POIUtils which is used to show your data,DataUtils which is used to handle your data by your ways. 其次,您应该编写一些类的utils,例如JsoupUtils来获取HTML并对其进行解析,HtmlUtils用来处理html注释并下载这些图片,DBUtils用来连接db和保存数据,POIUtils用来显示您的数据,DataUtils,用于通过您的方式处理数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM