简体   繁体   English

正则表达式匹配不包含空格

[英]Regular expression match doesn't include space

I have this regular expression: 我有这个正则表达式:

(?'box_id'\d{1,19})","box_name":"(?'box_name'[\w\d\.\s]{1,19})

This works well, except when the box name contains spaces. 这很好用,除非框名称包含空格。 For example, when executing it on my box it returns mybox , without the space. 例如,在my box上执行它时,它返回mybox ,没有空格。

How can I make it include spaces in the box_name group? 如何在box_name组中包含空格?

Code: 码:

Regex reg = new Regex(@"""object_id"":""(?<object_id>\d{1,19})"",""file_name"":""(?<file_name>[\w.]+(?:\s[\w.]+)*)""");
MatchCollection matches = reg.Matches(result);
if ( matches == null) throw new Exception("There was an error while parsing data."); 
if ( matches.Count > 0 )
{
  FileArchive.FilesDataTable filesdataTable = new FileArchive.FilesDataTable();
  foreach ( Match match in matches )
  {
    FileArchive.FilesRow row = filesdataTable.NewFilesRow();
    row.ID = match.Groups["object_id"].Value;
    row.Name = match.Groups["file_name"].Value;
  }
}

Input: 输入:

{"objects":[{"object_id":"135248","file_name":"some space here.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135257","file_name":"jup 13.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135260","file_name":"my pic.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135262","file_name":"EveningWav)es,Hon(olulu,Hawaii.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135280","file_name":"test with spaces.jpg","video_status":"0","thumbnail_status":"1"}],"status":"ok"} {“ objects”:[{“ object_id”:“ 135248”,“ file_name”:“此处为空格。jpg”,“ video_status”:“ 0”,“ thumbnail_status”:“ 1”},{“ object_id”:“ 135257“,” file_name“:” jup 13.jpg“,” video_status“:” 0“,” thumbnail_status“:” 1“},{” object_id“:” 135260“,” file_name“:” my pic.jpg“ ,“ video_status”:“ 0”,“ thumbnail_status”:“ 1”},{“ object_id”:“ 135262”,“ file_name”:“ EveningWav)es,Hon(olulu,Hawaii.jpg”,“ video_status”:“ 0“,” thumbnail_status“:” 1“},{” object_id“:” 135280“,” file_name“:”用spaces.jpg测试“,” video_status“:” 0“,” thumbnail_status“:” 1“}] ,“状态”:“确定”}

It appears to me that your data is consistently double quote delimited, no? 在我看来,您的数据始终以双引号分隔,不是吗? That fact should be the basis of the regex: 这个事实应该是正则表达式的基础:

(?<box_id>\d{1,19})","file_name":"(?<box_name>[^"]{1,19})  //1 to 19 non " chars.

As far as missing spaces, this token, (?'box_name'[\\w\\d.\\s]{1,19}) , cannot match 'mybox' on a string containing 'my box', so that issue must be downstream. 至于缺少空格,此标记(?'box_name'[\\ w \\ d。\\ s] {1,19})不能与包含“ my box”的字符串上的“ mybox”匹配,因此该问题必须在下游。

Typos and style: you have the literal 'box_name' but the tokens are 'file_name'. 错别字和样式:您拥有字面意义上的“ box_name”,但标记为“ file_name”。 Also, why in the world would you switch to using single quotes as the named group delimiter when <> brackets, the default, are MORE readable (since quotes are in the regex!) 另外,为什么世界上在<>方括号(默认情况下)更具可读性的情况下,您会切换为使用单引号作为命名组分隔符(因为引号在正则表达式中!)。

In addition to what @sweaver2112 said, I think you need to expand the framing by adding quotes and get rid of the {1,19} range. 除了@ sweaver2112所说的以外,我认为您还需要通过添加引号来扩展框架,并摆脱{1,19}范围。

These regex's work in Perl, I don't want to crank up C# to test it. 这些正则表达式在Perl中的工作,我不想加速C#对其进行测试。

"(?<box_id>\\d+)","(?:${type})":"(?<box_name>[\\w.]+(?:\\s[\\w.]+)*)"
or, 要么,
"\\s*(?<box_id>\\d+)\\s*","\\s*(?:${type})\\s*":"\\s*(?<box_name>[\\w.]+(?:\\s[\\w.]+)*)\\s*"
where $type = 'file_name'; 其中$ type ='file_name';

Realistically though, this should work too (type is substituted). 但实际上,这也应该起作用(用类型代替)。 Its validation is relaxed. 它的验证是轻松的。
"(?<box_id>\\d+)","file_name":"(?<box_name>[^"]*)"

edit 编辑

"Not sure, what did my regex return to you? – sln yesterday “不确定,我的正则表达式还给您什么?–昨天
It returned correct results, in the input in my question i got 'somespacehere.jpg' 'jup13.jpg' and so on for file_name group. 它返回了正确的结果,在我输入的问题中,我得到了“ somespacehere.jpg”,“ jup13.jpg”,以此类推。 – NET Developer yesterday " –昨天的.NET开发人员

I took your code and input and just print the groups, it works perfect. 我接受了您的代码并输入内容,然后仅打印了组,效果很好。 The spaces are there, 那里有空间,
something must be a problem with assigning it to your ROW data. 将其分配给您的ROW数据一定有问题。

See it here http://www.ideone.com/HsTMF 在这里查看http://www.ideone.com/HsTMF

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = @"{""objects"":[{""object_id"":""135248"",""file_name"":""some space here.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135257"",""file_name"":""jup 13.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135260"",""file_name"":""my pic.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135262"",""file_name"":""EveningWav)es,Hon(olulu,Hawaii.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135280"",""file_name"":""test with spaces.jpg"",""video_status"":""0"",""thumbnail_status"":""1""}],""status"":""ok""}";
      Regex reg = new Regex(
                   @"""object_id"":""(?<object_id>\d{1,19})"",""file_name"":""(?<file_name>[\w.]+(?:\s[\w.]+)*)"""
      );
      foreach ( Match match in reg.Matches(input) )
         Console.WriteLine(
                 "Id = '{0}',  File name = '{1}'", 
                 match.Groups["object_id"].Value,
                 match.Groups["file_name"].Value  );
   }
}

Output: 输出:

Id = '135248',  File name = 'some space here.jpg'
Id = '135257',  File name = 'jup 13.jpg'
Id = '135260',  File name = 'my pic.jpg'
Id = '135280',  File name = 'test with spaces.jpg'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM