简体   繁体   中英

How do I find HTML links missing the protocol in the href attribute?

I'm trying to find incorrectly written links like this:

<a href="mydomain.com">link</a>

I've got this regex:

href *= *"? *(?!http|https|ftp)

But if doesn't work... any ideas?

Thanks

Using GNU grep :

% echo '
<a href="http://mydomain.com">link</a>
<a href="https://mydomain.com">link</a>
<a href="ftp://mydomain.com">link</a>
<a title="My Domain"
   href="mydomain.com">link</a>
' | grep --perl -o 'href[[:space:]]*=[[:space:]]*"(?!(ht|f)tps?://)[^"]+"'
href="mydomain.com"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM