正则表达式以匹配Node.js中的url模式

Question

I am working on a node application, i need a regex to match the url pattern and get information out of the url, suggest the possible solutions. 我正在开发一个节点应用程序，我需要一个正则表达式来匹配url模式并从url中获取信息，并提出可能的解决方案。

This are the url patterns:
1) www.mysite.com/Paper/cat_CG10
2) www.mysite.com/White-Copy-Printer-Paper/cat_DP5027
3) www.mysite.com/pen/directory_pen?
4) www.mysite.com/Paper-Mate-Profile-Retractable-Ballpoint-Pens-Bold-Point-Black-Dozen/product_612884
5) www.mysite.com/22222/directory_22222?categoryId=12328

These is what is want from the above url:
1) name= "cat" value="CG10"
2) name= "cat" value="DP5027"
3) name= "directory" value ="pen"
4) name="product" value ="612884"
5) name="directory" value="22222" params = {categoryId : 12328}

I want a regex which can match the url pattern and get the values like name, value and params out of the urls.

Answer 1

This function does the trick for the urls and desired matches you've provided. 此功能可以解决您提供的网址和所需匹配项的问题。 It will also parse out an infinite number of query parameters. 它还将解析出无限数量的查询参数。

Fiddle: http://jsfiddle.net/8a9nK/ 小提琴： http : //jsfiddle.net/8a9nK/

function parseUrl(url)
{
    var split = /^.*\/(cat|directory|product)_([^?]*)\??(.*)$/gi.exec(url);
    var final_params = {};
    split[3].split('&').forEach(function(pair){
       var ps = pair.split('=');
       final_params[ps[0]] = ps[1];
    });
    return {
        name: split[1], 
        value: split[2], 
        params: final_params
    };
}

Explanation 说明

^ Start from the beginning of the string ^从字符串开头
.* Match any number of anything (The beginning of the url we don't care about) .*匹配任意数量的任何内容（我们不在乎的url的开头）
\\/ Match a single backslash (The last one before the things we care about) \\/匹配一个反斜杠（我们关心的事情之前的最后一个）
(cat|directory|product) Match and capture the word cat OR directory OR product (This is our name ) (cat|directory|product)匹配并捕获单词cat OR directory OR product（这是我们的名称）
_ Match an underscore (The character separating our name and value ) _匹配下划线（将我们的名字和值分开的字符）
([^?]*) Match and capture any number of anything EXCEPT a question mark (This is our value ) ([^?]*)匹配并捕获除问号以外的任何其他内容（这是我们的价值）
\\?? Match a question mark if it exists, otherwise don't worry about it (The start of a potential query string) 匹配问号（如果存在），否则不必担心（潜在查询字符串的开头）
(.*) Match and capture any number of anything (This is the entire query string that we will split into param later) (.*)匹配并捕获任意数量的任何内容（这是我们稍后将拆分为参数的整个查询字符串）
$ Match the end of the string $匹配字符串的结尾

Answer 2

The regex below would have in its match groups 1 & 2 the desired values 下面的正则表达式在其匹配组1和2中将具有所需的值

/^\/[^\/]+\/([^_]+)_([^\/_?]+).*$/

Explained piece by peace on the string /HP-ENVY-TouchSmart-m7-j010dx-173-Touc‌h-Screen-Refurbished-Laptop/product_8000 : 对字符串/HP-ENVY-TouchSmart-m7-j010dx-173-Touc‌h-Screen-Refurbished-Laptop/product_8000进行和平解释：

^ : from beginning ^ ：从头开始
\\/ : match a / \\/ ：匹配一个/
[^\\/]+ : match everything until a / ( HP-ENVY-TouchSmart-m7-j010dx-173-Touc‌h-Screen-Refurbished-Laptop ) [^\\/]+ ：匹配所有内容，直到/ （ HP-ENVY-TouchSmart-m7-j010dx-173-Touc‌h-Screen-Refurbished-Laptop ）
\\/ : match a / \\/ ：匹配一个/
([^_]+) match and capture the value before the _ ( product ) ([^_]+)匹配并捕获_ （ product ）之前的值
_ : match a _ _ ：匹配_
([^\\/_?]+) match and capture the value after the _ stopped by a ? ([^\\/_?]+)匹配并捕获_并以?停止后的值? , _ or / ( 8000 ) ， _或/ （ 8000 ）
.* match until the end - if there is anything .*匹配到最后-如果有的话
$ end $结束

Example: 例：

var re = /^[^\/]+\/[^\/]+\/([^_]+)_([^\/_?]+).*$/;
var matches = re.exec('www.mysite.com/22222/directory_22222?categoryId=12328');
console.log(matches.splice(1));

output: 输出：

["directory", "22222"]

Answer 3

use the url module to help you, not everything needs to be done with a regex :) 使用url模块可以为您提供帮助，而不是需要使用正则表达式来完成所有工作：）

var uri = require( 'url' ).parse( 'www.mysite.com/22222/directory_22222?categoryId=12328', true );

which yields (with other stuff): 产生（与其他东西）：

{ 
  query: { categoryId: '12328' },
  pathname: 'www.mysite.com/22222/directory_22222'
}

now to get your last part: 现在得到您的最后一部分：

uri.pathParams = {};
uri.pathname.split('/').pop().split('_').forEach( function( val, ix, all ){
    (ix&1) && ( uri.pathParams[ all[ix-1] ] = val );
} );

which yields: 产生：

{ 
  query: { categoryId: '12328' },
  pathParams: { directory: '22222 },

  ... a bunch of other stuff you don't seem to care about
}

正则表达式以匹配Node.js中的url模式

问题描述

3 个解决方案

解决方案1
1 2014-01-22 14:22:24

解决方案2
0 2014-01-22 13:55:12

解决方案3
0 2014-01-22 14:33:00

正则表达式以匹配Node.js中的url模式

问题描述

3 个解决方案

解决方案1 1 2014-01-22 14:22:24

解决方案2 0 2014-01-22 13:55:12

解决方案3 0 2014-01-22 14:33:00

解决方案1
1 2014-01-22 14:22:24

解决方案2
0 2014-01-22 13:55:12

解决方案3
0 2014-01-22 14:33:00