简体   繁体   English

正则表达式-从PHP中的html字符串获取表

[英]Regular Expression - get tables from html string in PHP

I try to wrap all tables inside my content with a special div container, to make them usable for mobile. 我尝试使用特殊的div容器将所有表包装在内容中,以使其可用于移动设备。 I can't wrap the tables, before they are saved within the database of the custom CSS. 在将表保存到自定义CSS的数据库中之前,我无法对其进行包装。 I managed to get to the content, before it's printed on the page and I need to preg_replace all the tables there. 在将内容打印在页面上之前,我设法找到了内容,并且需要preg_replace所有表。

I do this, to get all tables: 我这样做是为了获取所有表:

preg_match_all('/(<table[^>]*>(?:.|\n)*<\/table>)/', $aFile['sContent'], $aMatches);

The problem is to get the inner part (?:.|\\n)* to match everything that is inside the tags, without matching the ending tag. 问题是要获取内部部分(?:.|\\n)*以匹配标签内部的所有内容,而不匹配结尾标签。 Right now the expression matches everything, even the ending tag of the table... 现在,表达式匹配所有内容,甚至是表的结束标记...

Is there a way to exclude the match for the ending tag? 有没有一种方法可以排除结尾标记的匹配项?

You need to perform a non greedy match: /(<table[^>]*>(?:.|\\n)*?<\\/table>)/ . 您需要执行非贪婪匹配:/(< /(<table[^>]*>(?:.|\\n)*?<\\/table>)/ ?:.| /(<table[^>]*>(?:.|\\n)*?<\\/table>)/ Note the question mark: ? 注意问号: ? .

However, I would use a DOM parser for that: 但是,我将为此使用DOM解析器:

$doc = new DOMDocument();
$doc->loadHTML($html);

$tables = $doc->getElementsByTagName('table');
foreach($tables as $table) {
    $content = $doc->saveHTML($table); 
}

While it is already more convenient to use a DOM parser for extracting data from HTML documents, it is definitely the better solution if you are attempting to modify the HTML (as you told). 尽管使用DOM解析器从HTML文档中提取数据已经更加方便了,但是如果您试图修改HTML(如您所述),则绝对是更好的解决方案。

如果您不想匹配结束标签,可以使用超前模式,

preg_match_all('/(<table[^>]*>(?:.|\n)*(?=<\/table>))/', $aFile['sContent'], $aMatches);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM