如何使用jQuery解析此HTML？

Question

Going crazy trying to figure this out for the past 2 hours. 过去两个小时试图解决这个问题变得疯狂。 I have this html returned as a string from an AJAX request: 我有这个html作为AJAX请求的字符串返回：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Preview</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta name="author" content="Connected Ventures LLC. Copyright 1999-2010." />
    <script type="text/javascript" src="js/jquery.js"></script>
    <script type="text/javascript" src="js/jquery.ui.js"></script>
    <script type="text/javascript" src="js/article.js"></script>
    <link href="/css/global.css" rel="stylesheet" type="text/css" />
    <link href="/css/article.css" rel="stylesheet" type="text/css" />
    <style type="text/css">
    html, body { background: #fff; color: #000; }
    </style>
</head>
<body class="the_article">
        <p>s</p></body>
</html>

I need to get the content in between the body tags. 我需要在正文标签之间获取内容。 I already tried this which was suggested in another SO question on parsing html via jQuery: 我已经尝试过了，这是在通过jQuery解析html的另一个SO问题中建议的：

$(ajax_response).find('body.the_article').html();

Didn't work. 没用 Even after adding: 即使添加：

dataType: 'html'

as an ajax request parameter. 作为ajax请求参数。 Then I tried to parse it using regex: 然后我尝试使用正则表达式解析它：

ajax_response.match(/<body class="the_article">.*?<\/body>/);

it just alerts null. 它只是警告null。 Any idea how I can get the body content? 知道如何获取身体含量吗？

Answer 1

Your REGEX is failing because the string is multi-line, and the . 您的REGEX失败，因为字符串是多行，而. wildcard matches all characters except whitespace characters, so the newline after, say, the opening body tag and the body's content, breaks the pattern. 通配符匹配除空格字符以外的所有字符，因此，例如，开头的body标签和正文的内容之后的换行符会破坏模式。

Use [\\s\\S] instead of . 使用[\\s\\S]代替. (literally, allow non-space and space characters) （从字面上看，允许使用非空格和空格字符）

/<body class="the_article">[\s\S]*?<\/body>/

[EDIT] - in response to the comment, to capture the body content exclusive of its tags, capture the contents as a sub-group: [编辑]-响应评论，要捕获正文内容（不包括其标签），请将内容捕获为一个子组：

var body = response.match(/<body class="the_article">([\s\S]*?)(?=<\/body>)/);
console.log(body[1]); //body content, not including tag

Note also we specify the closing body tag as a look-ahead, since we don't need to match that at all, merely anchor to it. 还要注意，我们将关闭主体标签指定为先行标签，因为我们根本不需要匹配它，只需将其锚定即可。 (JS doesn't support look-behinds, short of simulations like the one I wrote , so we have no choice but to capture the opening body tag). （JS不支持回溯功能，缺少像我编写的那样的模拟，因此我们别无选择，只能捕获开头的 body标签）。

Answer 2

You could let the dom do the work for you. 您可以让dom为您完成工作。 Inject the code in an iframe with document.write, and then access the frame.document.body.innerHTML property. 使用document.write将代码注入到iframe中，然后访问frame.document.body.innerHTML属性。

如何使用jQuery解析此HTML？

问题描述

2 个解决方案

解决方案1
0 已采纳 2012-07-05 22:03:07

解决方案2
0 2012-07-05 22:10:16

如何使用jQuery解析此HTML？

问题描述

2 个解决方案

解决方案1 0 已采纳 2012-07-05 22:03:07

解决方案2 0 2012-07-05 22:10:16

解决方案1
0 已采纳 2012-07-05 22:03:07

解决方案2
0 2012-07-05 22:10:16