[英]Regular expression match doesn't include space
I have this regular expression: 我有这个正则表达式:
(?'box_id'\d{1,19})","box_name":"(?'box_name'[\w\d\.\s]{1,19})
This works well, except when the box name contains spaces. 这很好用,除非框名称包含空格。 For example, when executing it on
my box
it returns mybox
, without the space. 例如,在
my box
上执行它时,它返回mybox
,没有空格。
How can I make it include spaces in the box_name
group? 如何在
box_name
组中包含空格?
Code: 码:
Regex reg = new Regex(@"""object_id"":""(?<object_id>\d{1,19})"",""file_name"":""(?<file_name>[\w.]+(?:\s[\w.]+)*)""");
MatchCollection matches = reg.Matches(result);
if ( matches == null) throw new Exception("There was an error while parsing data.");
if ( matches.Count > 0 )
{
FileArchive.FilesDataTable filesdataTable = new FileArchive.FilesDataTable();
foreach ( Match match in matches )
{
FileArchive.FilesRow row = filesdataTable.NewFilesRow();
row.ID = match.Groups["object_id"].Value;
row.Name = match.Groups["file_name"].Value;
}
}
Input: 输入:
{"objects":[{"object_id":"135248","file_name":"some space here.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135257","file_name":"jup 13.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135260","file_name":"my pic.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135262","file_name":"EveningWav)es,Hon(olulu,Hawaii.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135280","file_name":"test with spaces.jpg","video_status":"0","thumbnail_status":"1"}],"status":"ok"}
{“ objects”:[{“ object_id”:“ 135248”,“ file_name”:“此处为空格。jpg”,“ video_status”:“ 0”,“ thumbnail_status”:“ 1”},{“ object_id”:“ 135257“,” file_name“:” jup 13.jpg“,” video_status“:” 0“,” thumbnail_status“:” 1“},{” object_id“:” 135260“,” file_name“:” my pic.jpg“ ,“ video_status”:“ 0”,“ thumbnail_status”:“ 1”},{“ object_id”:“ 135262”,“ file_name”:“ EveningWav)es,Hon(olulu,Hawaii.jpg”,“ video_status”:“ 0“,” thumbnail_status“:” 1“},{” object_id“:” 135280“,” file_name“:”用spaces.jpg测试“,” video_status“:” 0“,” thumbnail_status“:” 1“}] ,“状态”:“确定”}
It appears to me that your data is consistently double quote delimited, no? 在我看来,您的数据始终以双引号分隔,不是吗? That fact should be the basis of the regex:
这个事实应该是正则表达式的基础:
(?<box_id>\d{1,19})","file_name":"(?<box_name>[^"]{1,19}) //1 to 19 non " chars.
As far as missing spaces, this token, (?'box_name'[\\w\\d.\\s]{1,19}) , cannot match 'mybox' on a string containing 'my box', so that issue must be downstream. 至于缺少空格,此标记(?'box_name'[\\ w \\ d。\\ s] {1,19})不能与包含“ my box”的字符串上的“ mybox”匹配,因此该问题必须在下游。
Typos and style: you have the literal 'box_name' but the tokens are 'file_name'. 错别字和样式:您拥有字面意义上的“ box_name”,但标记为“ file_name”。 Also, why in the world would you switch to using single quotes as the named group delimiter when <> brackets, the default, are MORE readable (since quotes are in the regex!)
另外,为什么世界上在<>方括号(默认情况下)更具可读性的情况下,您会切换为使用单引号作为命名组分隔符(因为引号在正则表达式中!)。
In addition to what @sweaver2112 said, I think you need to expand the framing by adding quotes and get rid of the {1,19} range. 除了@ sweaver2112所说的以外,我认为您还需要通过添加引号来扩展框架,并摆脱{1,19}范围。
These regex's work in Perl, I don't want to crank up C# to test it. 这些正则表达式在Perl中的工作,我不想加速C#对其进行测试。
"(?<box_id>\\d+)","(?:${type})":"(?<box_name>[\\w.]+(?:\\s[\\w.]+)*)"
or, 要么,
"\\s*(?<box_id>\\d+)\\s*","\\s*(?:${type})\\s*":"\\s*(?<box_name>[\\w.]+(?:\\s[\\w.]+)*)\\s*"
where $type = 'file_name'; 其中$ type ='file_name';
Realistically though, this should work too (type is substituted). 但实际上,这也应该起作用(用类型代替)。 Its validation is relaxed.
它的验证是轻松的。
"(?<box_id>\\d+)","file_name":"(?<box_name>[^"]*)"
edit 编辑
"Not sure, what did my regex return to you? – sln yesterday “不确定,我的正则表达式还给您什么?–昨天
It returned correct results, in the input in my question i got 'somespacehere.jpg' 'jup13.jpg' and so on for file_name group. 它返回了正确的结果,在我输入的问题中,我得到了“ somespacehere.jpg”,“ jup13.jpg”,以此类推。 – NET Developer yesterday "
–昨天的.NET开发人员
I took your code and input and just print the groups, it works perfect. 我接受了您的代码并输入内容,然后仅打印了组,效果很好。 The spaces are there,
那里有空间,
something must be a problem with assigning it to your ROW data. 将其分配给您的ROW数据一定有问题。
See it here http://www.ideone.com/HsTMF 在这里查看http://www.ideone.com/HsTMF
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = @"{""objects"":[{""object_id"":""135248"",""file_name"":""some space here.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135257"",""file_name"":""jup 13.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135260"",""file_name"":""my pic.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135262"",""file_name"":""EveningWav)es,Hon(olulu,Hawaii.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135280"",""file_name"":""test with spaces.jpg"",""video_status"":""0"",""thumbnail_status"":""1""}],""status"":""ok""}";
Regex reg = new Regex(
@"""object_id"":""(?<object_id>\d{1,19})"",""file_name"":""(?<file_name>[\w.]+(?:\s[\w.]+)*)"""
);
foreach ( Match match in reg.Matches(input) )
Console.WriteLine(
"Id = '{0}', File name = '{1}'",
match.Groups["object_id"].Value,
match.Groups["file_name"].Value );
}
}
Output: 输出:
Id = '135248', File name = 'some space here.jpg'
Id = '135257', File name = 'jup 13.jpg'
Id = '135260', File name = 'my pic.jpg'
Id = '135280', File name = 'test with spaces.jpg'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.