简体   繁体   中英

parse tag with regex in javascript

I have this string:

   s='data-id="a1429883480588" class="privateMessage" @zaza
    data-id="a1429883480589" class="privateMessage" @zaza2
    data-id="a1429883480598" class="privateMessage" @zaza3'

My goal is to capture the what's between : data-id=" and " to have results: [a1429883480588, a1429883480589, a1429883480598]

I tried with

var splitted = s.match(/data-id="(\w)+(?=")/g)

But this also captures data-id=" and "

Any idea on how to write this regex ?

It must be done with JS since it is nodeJS function !

If you're happy that the string will always be well formed and not mangled up. Here's one that'll do it:

var s = '<span data-id="a1429883480588" class="privateMessage">@zaza</span>&nbsp;';
s += '<span data-id="a1429883480589" class="privateMessage">@zaza2</span>&nbsp;';
s += '<span data-id="a1429883480598" class="privateMessage">@zaza3</span>';

s.match(/data-id="\w+"/g).map(function(attributeAndValue) {
    return attributeAndValue.split('"')[1];
})

The concerns raised above about using RegEx to parse HTML are valid but more for HTML in the wild.

Here's the cheerio equivalent, just for reference or whatever

var cheerio = require('cheerio');

var markup = '<span data-id="a1429883480588" class="privateMessage">@zaza</span>&nbsp;<span data-id="a1429883480589" class="privateMessage">@zaza2</span>&nbsp;<span data-id="a1429883480598" class="privateMessage">@zaza3</span>';
var $ = cheerio.load('<div>'+markup+'</div>');
var ids = Array.prototype.map.call($('[data-id]'), function(e) {
    return $(e).attr('data-id');
});

console.log(ids);
// [ 'a1429883480588', 'a1429883480589', 'a1429883480598' ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM