简体   繁体   English

如何使用jQuery解析此HTML?

[英]How to parse this HTML using jQuery?

Going crazy trying to figure this out for the past 2 hours. 过去两个小时试图解决这个问题变得疯狂。 I have this html returned as a string from an AJAX request: 我有这个html作为AJAX请求的字符串返回:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Preview</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta name="author" content="Connected Ventures LLC. Copyright 1999-2010." />
    <script type="text/javascript" src="js/jquery.js"></script>
    <script type="text/javascript" src="js/jquery.ui.js"></script>
    <script type="text/javascript" src="js/article.js"></script>
    <link href="/css/global.css" rel="stylesheet" type="text/css" />
    <link href="/css/article.css" rel="stylesheet" type="text/css" />
    <style type="text/css">
    html, body { background: #fff; color: #000; }
    </style>
</head>
<body class="the_article">
        <p>s</p></body>
</html>

I need to get the content in between the body tags. 我需要在正文标签之间获取内容。 I already tried this which was suggested in another SO question on parsing html via jQuery: 我已经尝试过了,这是在通过jQuery解析html的另一个SO问题中建议的:

$(ajax_response).find('body.the_article').html();

Didn't work. 没用 Even after adding: 即使添加:

dataType: 'html'

as an ajax request parameter. 作为ajax请求参数。 Then I tried to parse it using regex: 然后我尝试使用正则表达式解析它:

ajax_response.match(/<body class="the_article">.*?<\/body>/); 

it just alerts null. 它只是警告null。 Any idea how I can get the body content? 知道如何获取身体含量吗?

Your REGEX is failing because the string is multi-line, and the . 您的REGEX失败,因为字符串是多行,而. wildcard matches all characters except whitespace characters, so the newline after, say, the opening body tag and the body's content, breaks the pattern. 通配符匹配除空格字符以外的所有字符,因此,例如,开头的body标签和正文的内容之后的换行符会破坏模式。

Use [\\s\\S] instead of . 使用[\\s\\S]代替. (literally, allow non-space and space characters) (从字面上看,允许使用非空格和空格字符)

/<body class="the_article">[\s\S]*?<\/body>/

[EDIT] - in response to the comment, to capture the body content exclusive of its tags, capture the contents as a sub-group: [编辑]-响应评论,要捕获正文内容(不包括其标签),请将内容捕获为一个子组:

var body = response.match(/<body class="the_article">([\s\S]*?)(?=<\/body>)/);
console.log(body[1]); //body content, not including tag

Note also we specify the closing body tag as a look-ahead, since we don't need to match that at all, merely anchor to it. 还要注意,我们将关闭主体标签指定为先行标签,因为我们根本不需要匹配它,只需将其锚定即可。 (JS doesn't support look-behinds, short of simulations like the one I wrote , so we have no choice but to capture the opening body tag). (JS不支持回溯功能,缺少像我编写的那样的模拟,因此我们别无选择,只能捕获开头的 body标签)。

You could let the dom do the work for you. 您可以让dom为您完成工作。 Inject the code in an iframe with document.write, and then access the frame.document.body.innerHTML property. 使用document.write将代码注入到iframe中,然后访问frame.document.body.innerHTML属性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM