简体   繁体   中英

Regex for no space between attributes html

How to detected no space between attributes. Example:

 <div style="margin:37px;"/></div>
 <span title=''style="margin:37px;" /></span>
 <span title="" style="margin:37px;" /></span>
 <a title="u" hghghgh  title="j" >

 <a title=""gg  ff>

correct: 1,3,4 incorrect: 2,5 How to detected incorrect?

I've tried with this:

<(.*?=(['"]).*?\\2)([\\S].*)|(^/)>

But it's not working.

You should not use regex to parse HTML , unless for learning purpose.


http://regexr.com/3cge1

<\w+(\s+[\w-]+(=(['"]?)[^"']*\3)?)*\s*/?>

This regular expression matches even if you don't have any attribute at all. It works for self-closing tags, and if the attribute has no value.


  • <\\w+ Match opening < and \\w characters.

  • (\\s+[\\w-]+(=(['"])[^"']*\\3)?)* zero or more attributes that must start with a white space. It contains two parts:

    • \\s+[\\w-]+ attribute name after mandatory space
    • (=(['"])[^"']*\\3)? optional attribute value
  • \\s*/?> optional white space and optional / followed by closing > .


Here is a test for the strings:

var re = /<\w+(\s+[\w-]+(=(['"]?)[^"']*\3)?)*\s*\/?>/g;

! '<div style="margin:37px;"/></div>'.match(re);
false

! '<span title=\'\'style="margin:37px;" /></span>'.match(re);
true

! '<span title="" style="margin:37px;" /></span>'.match(re);
false

! '<a title="u" hghghgh  title="j" >'.match(re);
false

! '<a title=""gg  ff>'.match(re);
true

Display all incorrect tags:

var html = '<div style="margin:37px;"></div> <span title=\'\'style="margin:37px;"/><a title=""gg ff/> <span title="" style="margin:37px;" /></span> <a title="u" hghghgh title="j"example> <a title=""gg ff>';
var tagRegex = /<\w+[^>]*\/?>/g;
var validRegex = /<\w+(\s+[\w-]+(=(['"]?)[^"']*\3)?)*\s*\/?>/g;

html.match(tagRegex).forEach(function(m) {
  if(!m.match(validRegex)) {
    console.log('Incorrect', m);
  }
});

Will output

Incorrect <span title=''style="margin:37px;"/>
Incorrect <a title=""gg ff/>
Incorrect <a title="u" hghghgh title="j"example>
Incorrect <a title=""gg ff>

Update for the comments

<\w+(\s+[\w-]+(="[^"]*"|='[^']*'|=[\w-]+)?)*\s*/?>

Try this regex , i think it will work

<\w*[^=]*=["'][\w;:]*["'][\s/]+[^>]*>

< - starting bracket

\\w* - one or more alphanumeric character

[^=]*= - It will cover all the character till '=' shows up ["'][\\w;:]*["'] - this will match two cases 1. one with single quote with having strings optional 2. one with double quote with having strings optional

[\\s/]+ - match the space or '\\' atleast one occurence

[^>]* - this will match all the character till '>' closing bracket

I got this pattern to work, finding incorrect lines 2 and 5 as you requested:

>>> import re
>>> p = r'<[a-z]+\s[a-z]+=[\'\"][\w;:]*[\"\'][\w]+.*'

>>> html = """
 <div style="margin:37px;"/></div>
 <span title=''style="margin:37px;" /></span>
 <span title="" style="margin:37px;" /></span>
 <a title="u" hghghgh  title="j" >

 <a title=""gg  ff>
"""

>>> bad = re.findall(p, html)
>>> print '\n'.join(bad)
<span title=''style="margin:37px;" /></span>
<a title=""gg  ff>

regex broken down:

p = r'<[a-z]+\s[a-z]+=[\'\"][\w;:]*[\"\'][\w]+.*'

< - starting bracket

[az]+\\s - 1 or more lowercase letters followed by a space

[az]+= - 1 or more lowercase letters followed by an equals sign

[\\'\\"] - match a single or double quote one time

[\\w;:]* - match an alphnumeric character (a-zA-Z0-9_) or a colon or semi-colon 0 or more times

[\\"\\'] - again match a single or double quote one time

[\\w]+ - match an alphanumeric character one or more times(this catches the lack of a space you wanted to detect) ***

.* - match anything 0 or more times(gets rest of the line)

Not sure about this I am not so experienced at regex but this looks like it is working well

JS Fiddle

<([a-z]+)(\s+[a-z\-]+(="[^"]*")?)*\s*\/?>([^<]+(<\/$1>))?

Currently <([az]+) will mostly work but with web component and <ng-* this would better be \\w+

---------------

Output:

 <div style="margin:37px;">div</div> correct <span title=" style="margin:37px;" />span1</span> incorrect <span title="" style="margin:37px;" />span2</span> correct <a title="u" title="j">link</a> correct <a title=""href="" alt="" required>test</a> incorrect <img src="" data-abc="" required> correct <input type=""style="" /> incorrect 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM