I have a string test
<td><a href="4.%20Functions,%20scope.ppt">4. Functions, scope.ppt</a></td>
I want to find <a href="4.%20Functions,%20scope.ppt">
(as a substring)
As a search with Dr.Google: regex e ("<a href=.*?>"); cmatch =cm;
regex e ("<a href=.*?>"); cmatch =cm;
to mark substring that I want to find.
How I can do next?
Am I right to use regex_match(htmlString, cm, e);
with htmlString
as wchar_t*
If you want to find all the matching substrings then you need to use the regex iterators:
// example data
std::wstring const html = LR"(
<td><a href="4.%20Functions,%20scope.ppt">4. Functions, scope.ppt</a></td>
<td><a href="4.%20Functions,%20scope.ppt">4. Functions, scope.ppt</a></td>
<td><a href="4.%20Functions,%20scope.ppt">4. Functions, scope.ppt</a></td>
)";
// for convenience
constexpr auto fast_n_loose = std::regex_constants::optimize|std::regex_constants::icase;
// extract href's
std::wregex const e_link{LR"~(href=(["'])(.*?)\1)~", fast_n_loose};
int main()
{
// regex iterators
std::wsregex_iterator itr_end;
std::wsregex_iterator itr{std::begin(html), std::end(html), e_link};
// iterate through the matches
for(; itr != itr_end; ++itr)
{
std::wcout << itr->str(2) << L'\n';
}
}
This will match the complete a
tag and also get the href attribute value,
which is in capture group 2.
It should be done this way because the href attribute can be anywhere in the tag.
<a(?=(?:[^>"']|"[^"]*"|'[^']*')*?\\shref\\s*=\\s*(?:(['"])([\\S\\s]*?)\\1))\\s+(?:"[\\S\\s]*?"|'[\\S\\s]*?'|[^>]*?)+>
You can substitute [\\w:}+
in place of the a tag to get the href from all tags.
https://regex101.com/r/LHZXUM/1
< a # a tag, substitute [\w:]+ for any tag
(?= # Asserttion (a pseudo atomic group)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s href \s* = \s*
(?:
( ['"] ) # (1), Quote
( [\S\s]*? ) # (2), href value
\1
)
)
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.