简体   繁体   English

正则表达式模式与各种端点匹配

[英]Regex pattern matching with various end points

I'd like to extract substring, which has specific pattern, from following string list by javascript. 我想通过javascript从以下字符串列表中提取具有特定模式的子字符串。

But I have problem in setting regex pattern . 但是我在设置正则表达式模式时遇到了问题。

List of Input String 输入字符串列表

  1. search?w=tot&DA=YZR&t__nil_searchbox=btn&sug=&o=& q=%EB%B9%84%EC%BD%98 搜索?w = tot&DA = YZR&t__nil_searchbox = btn&sug =&o =& q=%EB%B9%84%EC%BD%98

  2. search? 搜索? q=%EB%B9%84%EC%BD%98 &go=%EC%A0…4%EB%B9%84%EC%BD%98&sc=8-2&sp=-1&sk=&cvid=f05407c5bcb9496990d2874135aee8e9 q=%EB%B9%84%EC%BD%98 &go =%EC%A0…4%EB%B9%84%EC%BD%98&sc = 8-2&sp = -1&sk =&cvid = f05407c5bcb9496990d2874135aee8e9

  3. where=nexearch& query=%EB%B9%84%EC%BD%98 &sm=top_hty&fbm=0&ie=utf8 其中= nexearch& query=%EB%B9%84%EC%BD%98 &sm = top_hty&fbm = 0&ie = utf8

Expected Pattern Matching Result 预期模式匹配结果

%EB%B9%84%EC%BD%98 for above cases. 以上情况为%EB%B9%84%EC%BD%98

Regex 正则表达式

/(query|q)=.* + ADDITIONAL REGEX HERE + / /(query|q)=.* + 这里的其他正则表达式 + /

Its end point would be $ or first appeared & 它的终点是$first appeared &

Question

What should I write for ADDITIONAL REGEX ? 我应该为ADDITIONAL REGEX写什么?

You can test it HERE . 您可以在这里进行测试。 Thanks. 谢谢。

Turn the first capturing group to non-capturing group and then add a negated character class instead of .* 将第一个捕获组转换为非捕获组,然后添加一个否定的字符类而不是.*

\b(?:query|q)=([^&\n]*)

DEMO DEMO

> var s = "where=nexearch& query=%EB%B9%84%EC%BD%98&sm=top_hty&fbm=0&ie=utf8"
undefined
> var pat = /\b(?:query|q)=([^&\n]*)/;
> pat.exec(s)[1]
'%EB%B9%84%EC%BD%98'

I'd personally suggest an alternate approach, using a more-procedural function to match the required parameter-values instead of a 'simple' regular expression. 我个人建议一种替代方法,使用更多过程函数来匹配所需的参数值,而不是“简单”的正则表达式。 While it may look more complex at first, it does allow for easy extension should you need to find different, or additional, parameter values in future. 虽然它乍看起来可能比较复杂,但是如果您将来需要查找不同的或附加的参数值,它的确可以轻松扩展。

That said: 说:

/* haystack:
     String, the string in which you're looking for the
     parameter-values,
   needles:
     Array, the parameters whose values you're looking for
*/
function queryGrab(haystack, needles) {
  // creating a regular expression from the array of needles,
  // given an array of ['q','query'], this will result in:
  // /^(q)|(query)/gi
  var reg = new RegExp('^(' + needles.join(')|(') + ')', 'gi'),

    // finding either the index of the '?' character in the haystack:
    queryIndex = haystack.indexOf('?'),

    // getting the substring from the haystack, starting
    // after the '?' character:
    keyValues = haystack.substring(queryIndex + 1)
      // splitting that string on the '&' characters,
      // to form an array:
      .split('&')
      // filtering that array (with Array.prototype.filter()),
      // the 'keyValue' argument is the current array-element
      // from the array over which we're iterating:
      .filter(function(keyValue) {
        // if RegExp.prototype.test() returns true,
        // meaning the supplied string ('keyValue')
        // is matched by the created regular expression,
        // the current element is retained in the filtered
        // array:
        return reg.test(keyValue);
    // converting that filtered-array to a string
    // on the naive assumption each searched-string
    // should return only one match:
    }).toString();

  // returning a substring of the keyValue, from after
  // the position of the '=' character:
  return keyValues.substring(keyValues.indexOf('=') + 1);
}

// essentially irrelevant, just for the purposes of
// providing a demonstration; here we get all the
// elements of class="haystack":
var haystacks = document.querySelectorAll('.haystack'),

  // the parameters we're looking for:
  needles = ['q', 'query'],

  // an 'empty' variable for later use:
  retrieved;

// using Array.prototype.forEach() to iterate over, and
// perform a function on, each of the .haystack elements
// (using Function.prototype.call() to use the array-like
// NodeList instead of an array):
Array.prototype.forEach.call(haystacks, function(stack) {
  // like filter(), the variable is the current array-element

  // retrieved caches the found parameter-value (using
  // a variable because we're using it twice):
  retrieved = queryGrab(stack.textContent, needles);

  // setting the next-sibling's text:
  stack.nextSibling.nodeValue = '(found: ' + retrieved + ')';

  // updating the HTML of the current node, to allow for
  // highlighting:
  stack.innerHTML = stack.textContent.replace(retrieved, '<span class="found">$&</span>');
});

 function queryGrab(haystack, needles) { var reg = new RegExp('^(' + needles.join(')|(') + ')', 'gi'), queryIndex = haystack.indexOf('?'), keyValues = haystack.substring(queryIndex + 1) .split('&') .filter(function(keyValue) { return reg.test(keyValue); }).toString(); return keyValues.substring(keyValues.indexOf('=') + 1); } var haystacks = document.querySelectorAll('.haystack'), needles = ['q', 'query'], retrieved; Array.prototype.forEach.call(haystacks, function(stack) { retrieved = queryGrab(stack.textContent, needles); stack.nextSibling.nodeValue = '(found: ' + retrieved + ')'; stack.innerHTML = stack.textContent.replace(retrieved, '<span class="found">$&</span>'); }); 
 ul { margin: 0; padding: 0; } li { margin: 0 0 0.5em 0; padding-bottom: 0.5em; border-bottom: 1px solid #ccc; list-style-type: none; width: 100%; } .haystack { display: block; color: #999; } .found { color: #f90; } 
 <ul> <li><span class="haystack">search?w=tot&amp;DA=YZR&amp;t__nil_searchbox=btn&amp;sug=&amp;o=&amp;q=%EB%B9%84%EC%BD%98</span> </li> <li><span class="haystack">search?q=%EB%B9%84%EC%BD%98&amp;go=%EC%A0…4%EB%B9%84%EC%BD%98&amp;sc=8-2&amp;sp=-1&amp;sk=&amp;cvid=f05407c5bcb9496990d2874135aee8e9</span> </li> <li><span class="haystack">where=nexearch&amp;query=%EB%B9%84%EC%BD%98&amp;sm=top_hty&amp;fbm=0&amp;ie=utf8</span> </li> </ul> 

JS Fiddle (for easier off-site experimentation) . JS Fiddle(用于更轻松的异地实验)

References: 参考文献:

Regexps are not the best way to parse these query strings. 正则表达式不是解析这些查询字符串的最佳方法。 There are libraries and tools, but if you want to do it yourself: 有库和工具,但是如果您想自己做:

function parseQueryString(url) {
    return _.object(url .              // build an object from pairs
        split('?')[1]   .              // take the part after the ?
        split('&')      .              // split it by &
        map(function(str) {            // turn parts into 2-elt array
            return str.split('=');     // broken at =
        })
    );
}

This uses Underscore's _.object , which creates an object from an array of arrays of key/value pairs, but if you don't want to use that, you can write your own equivalent in a couple of lines. 这使用了_.object_.object ,它通过键/值对的数组数组创建对象,但是如果您不想使用它,则可以用两行代码编写自己的对象。

Now the value you are looking for is just 现在,您正在寻找的价值仅仅是

params = parseQueryString(url);
return params.q || params.query;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM