简体   繁体   中英

Regular Expression - get tables from html string in PHP

I try to wrap all tables inside my content with a special div container, to make them usable for mobile. I can't wrap the tables, before they are saved within the database of the custom CSS. I managed to get to the content, before it's printed on the page and I need to preg_replace all the tables there.

I do this, to get all tables:

preg_match_all('/(<table[^>]*>(?:.|\n)*<\/table>)/', $aFile['sContent'], $aMatches);

The problem is to get the inner part (?:.|\\n)* to match everything that is inside the tags, without matching the ending tag. Right now the expression matches everything, even the ending tag of the table...

Is there a way to exclude the match for the ending tag?

You need to perform a non greedy match: /(<table[^>]*>(?:.|\\n)*?<\\/table>)/ . Note the question mark: ? .

However, I would use a DOM parser for that:

$doc = new DOMDocument();
$doc->loadHTML($html);

$tables = $doc->getElementsByTagName('table');
foreach($tables as $table) {
    $content = $doc->saveHTML($table); 
}

While it is already more convenient to use a DOM parser for extracting data from HTML documents, it is definitely the better solution if you are attempting to modify the HTML (as you told).

如果您不想匹配结束标签,可以使用超前模式,

preg_match_all('/(<table[^>]*>(?:.|\n)*(?=<\/table>))/', $aFile['sContent'], $aMatches);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM