简体   繁体   中英

Parse css3 selectors with regex (javascript)

If someone has seen this question before, please link, perhaps I am searching the wrong things. I get nothing but results for parsing css files. Basically, I have an array of selectors, something like

[".thislink", "#myid"] .

I'm looking to pass any string selector formatted like css3 selectors, ex:

a.thislink:not(.ignore)[href^=http://]

into a .match and split it out into an array of selectors, ideally:

[a, .thislink, :not(.ignore), [href^=http://]]

that I can loop through. I would then use that same breakdown on any :not() selectors to get a second array of "not", which I can match against my original array of individual selectors.

Tag, class, ID, attr, and :not selectors should be all I need. I can figure out how to break down the [attr=val] and :not(selectorshere) myself, I think.

PS: I know it would be easy to match my original array values in the string selector, however, I don't actually have an array of selectors. It would take several paragraphs to explain why exactly I'm doing it this way, so just trust me, I can't do it =)

Just in case you won't succeed in finding a sufficient regex, may I suggest a JavaScript parser generator like PEG.js *. There's an online version of PEG.js that allows you to tinker with the grammar, then download the parser once satisfied with the result.

[ * ] PEG - Parsing Expression Grammar

For help with the grammar you should consult the W3C's working draft on CSS3 syntax W3C's recommendation on Selectors Level 3 .

I took the time and played around and came up with a reduced grammar for a single selector (element/id/attr/class/pseudo). You'd want to go over it and refine it here and there, probably.

/*
 * PEG.js grammar
 */
start      = element? hash? (class / attr / pseudo)*
element    = '*' / ident

ident      = i:(nmstart) j:(nmchar*) {return i + j.join('');}
hash       = h:('#' ident) {return h.join('');}
class      = c:('.' ident) {return c.join('');}
attr       = a:('[' (b:[^\]]+ {return b.join('');}) ']') {return a.join('');}
pseudo     = p:(':' function) {return p.join('');}

nmstart    = [a-z] / nonascii
nmchar     = [a-z0-9-] / nonascii
function   = f:(ident '(' body ')') {return f.join('');}
body       = b:[^\)]+ {return b.join('');}

nonascii   = [\x80-\xff]
_          = [ \t\n\r]+ {return '';}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM