简体   繁体   English

正则表达式以匹配Node.js中的url模式

[英]Regex to match the url pattern in nodejs

I am working on a node application, i need a regex to match the url pattern and get information out of the url, suggest the possible solutions. 我正在开发一个节点应用程序,我需要一个正则表达式来匹配url模式并从url中获取信息,并提出可能的解决方案。

This are the url patterns:
1) www.mysite.com/Paper/cat_CG10
2) www.mysite.com/White-Copy-Printer-Paper/cat_DP5027
3) www.mysite.com/pen/directory_pen?
4) www.mysite.com/Paper-Mate-Profile-Retractable-Ballpoint-Pens-Bold-Point-Black-Dozen/product_612884
5) www.mysite.com/22222/directory_22222?categoryId=12328

These is what is want from the above url:
1) name= "cat" value="CG10"
2) name= "cat" value="DP5027"
3) name= "directory" value ="pen"
4) name="product" value ="612884"
5) name="directory" value="22222" params = {categoryId : 12328}

I want a regex which can match the url pattern and get the values like name, value and params out of the urls.

This function does the trick for the urls and desired matches you've provided. 此功能可以解决您提供的网址和所需匹配项的问题。 It will also parse out an infinite number of query parameters. 它还将解析出无限数量的查询参数。

Fiddle: http://jsfiddle.net/8a9nK/ 小提琴: http : //jsfiddle.net/8a9nK/

function parseUrl(url)
{
    var split = /^.*\/(cat|directory|product)_([^?]*)\??(.*)$/gi.exec(url);
    var final_params = {};
    split[3].split('&').forEach(function(pair){
       var ps = pair.split('=');
       final_params[ps[0]] = ps[1];
    });
    return {
        name: split[1], 
        value: split[2], 
        params: final_params
    };
}

Explanation 说明

^ Start from the beginning of the string ^从字符串开头
.* Match any number of anything (The beginning of the url we don't care about) .*匹配任意数量的任何内容(我们不在乎的url的开头)
\\/ Match a single backslash (The last one before the things we care about) \\/匹配一个反斜杠(我们关心的事情之前的最后一个)
(cat|directory|product) Match and capture the word cat OR directory OR product (This is our name ) (cat|directory|product)匹配并捕获单词cat OR directory OR product(这是我们的名称
_ Match an underscore (The character separating our name and value ) _匹配下划线(将我们的名字分开的字符)
([^?]*) Match and capture any number of anything EXCEPT a question mark (This is our value ) ([^?]*)匹配并捕获除问号以外的任何其他内容(这是我们的价值
\\?? Match a question mark if it exists, otherwise don't worry about it (The start of a potential query string) 匹配问号(如果存在),否则不必担心(潜在查询字符串的开头)
(.*) Match and capture any number of anything (This is the entire query string that we will split into param later) (.*)匹配并捕获任意数量的任何内容(这是我们稍后将拆分为参数的整个查询字符串)
$ Match the end of the string $匹配字符串的结尾

The regex below would have in its match groups 1 & 2 the desired values 下面的正则表达式在其匹配组1和2中将具有所需的值

/^\/[^\/]+\/([^_]+)_([^\/_?]+).*$/

Explained piece by peace on the string /HP-ENVY-TouchSmart-m7-j010dx-173-Touc‌​h-Screen-Refurbished-Laptop/product_8000 : 对字符串/HP-ENVY-TouchSmart-m7-j010dx-173-Touc‌​h-Screen-Refurbished-Laptop/product_8000进行和平解释:

  • ^ : from beginning ^ :从头开始
  • \\/ : match a / \\/ :匹配一个/
  • [^\\/]+ : match everything until a / ( HP-ENVY-TouchSmart-m7-j010dx-173-Touc‌​h-Screen-Refurbished-Laptop ) [^\\/]+ :匹配所有内容,直到/HP-ENVY-TouchSmart-m7-j010dx-173-Touc‌​h-Screen-Refurbished-Laptop
  • \\/ : match a / \\/ :匹配一个/
  • ([^_]+) match and capture the value before the _ ( product ) ([^_]+)匹配并捕获_product )之前的值
  • _ : match a _ _ :匹配_
  • ([^\\/_?]+) match and capture the value after the _ stopped by a ? ([^\\/_?]+)匹配并捕获_并以?停止后的值? , _ or / ( 8000 ) _/8000
  • .* match until the end - if there is anything .*匹配到最后-如果有的话
  • $ end $结束

Example: 例:

var re = /^[^\/]+\/[^\/]+\/([^_]+)_([^\/_?]+).*$/;
var matches = re.exec('www.mysite.com/22222/directory_22222?categoryId=12328');
console.log(matches.splice(1));

output: 输出:

["directory", "22222"]

use the url module to help you, not everything needs to be done with a regex :) 使用url模块可以为您提供帮助,而不是需要使用正则表达式来完成所有工作:)

var uri = require( 'url' ).parse( 'www.mysite.com/22222/directory_22222?categoryId=12328', true );

which yields (with other stuff): 产生(与其他东西):

{ 
  query: { categoryId: '12328' },
  pathname: 'www.mysite.com/22222/directory_22222'
}

now to get your last part: 现在得到您的最后一部分:

uri.pathParams = {};
uri.pathname.split('/').pop().split('_').forEach( function( val, ix, all ){
    (ix&1) && ( uri.pathParams[ all[ix-1] ] = val );
} );

which yields: 产生:

{ 
  query: { categoryId: '12328' },
  pathParams: { directory: '22222 },

  ... a bunch of other stuff you don't seem to care about
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM