简体   繁体   English

使用C#Regex从大字符串中提取特定JSON

[英]Extracting specific JSON out of large string using C# Regex

Here's a sample string: 这是一个示例字符串:

Lorem ipsum dolor sit amet, ad eam option suscipit invidunt, ius propriae detracto cu. Lorem ipsum dolor就座,ad eam选项suscipit invidunt,iuspropriae detracto cu。 Nec te wisi lo{"firstName":"John", "lastName":"Doe"}rem, in quo vocent erroribus {"firstName":"Anna", "lastName":"Smith"}dissentias. {nc te wisi lo {“ firstName”:“ John”,“ lastName”:“ Doe”} rem,以错误的形式出现{{firstName“:” Anna“,” lastName“:” Smith“} dissentias。 At omittam pertinax senserit est, pri nihil alterum omittam ad, vix aperiam sententiae an. 在Omittam Pertinax感官上,pri nihil alterum omittam ad,即vix aperiam sententiae an。 Ferri accusam an eos, an facete tractatos moderatius sea{"firstName":"Peter", "lastName":"Jones"}. 费里(Ferri)指责某人,是多面的tractatos moderatius sea {“ firstName”:“ Peter”,“ lastName”:“ Jones”}。 Mel ad sale utamur, qui ut oportere omittantur, eos in facer ludus dicant. Mel广告销售utamur,qui utportere omittantur,eos in facer ludus dicant。

Assume the following data model exists: 假设存在以下数据模型:

public class Person
{
   public string firstName;
   public string lastName;
}

How could I use regex to extract JSON out of this text and create a List<Person> with: 如何使用正则表达式从此文本中提取JSON并创建具有以下内容的List<Person>

{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}

Objects can be buried anywhere in the string, so their position relative to words, letters, punctuations, whitespaces, etc. does not matter. 对象可以埋在字符串中的任何位置,因此它们相对于单词,字母,标点符号,空格等的位置无关紧要。 If the above JSON notation is broken, simply ignore it. 如果以上JSON表示法已损坏,则只需将其忽略即可。 The following would be invalid: 以下内容将无效:

{"firstName":"John", "middleName":"", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith", "age":""},
{"firstName":"Peter", "lastName":"Jones" some text}

In other words, pattern search must be strict to the following: 换句话说,模式搜索必须严格遵守以下条件:

{"firstName":"[val]", "lastName":"[val]"}

Here's a regex that you could use to extract the values: 这是一个可用来提取值的正则表达式:

({\s*"firstName":\s*"[^"]+",\s*"lastName":\s*"[^"]+"\s*})

After this, I'd suggest just using Json.NET to deserialize the objects. 在这之后,我建议仅使用Json.NET对对象进行反序列化。

use this code snippet, 使用此代码段,

//Take All first Name
    string strRegex1 = @"firstName"":""([^""""]*)"",";
//Take All Last Name
    string strRegex2 = @"lastName"":""([^""""]*)""";
    Regex myRegex = new Regex(strRegex, RegexOptions.None);
   Regex myRegex2 = new Regex(strRegex2, RegexOptions.None);
    string strTargetString = @"{""firstName"":""John"", ""middleName"":"""", ""lastName"":""Doe""}," + "\n" + @"{""firstName"":""Anna"", ""lastName"":""Smith"", ""age"":""""}," + "\n" + @"{""firstName"":""Peter"", ""lastName"":""Jones"" some text}";

    foreach (Match myMatch in myRegex.Matches(strTargetString))
    {
      if (myMatch.Success)
      {
       // Add your code here for First Name
      }
    }

foreach (Match myMatch in myRegex2.Matches(strTargetString))
    {
      if (myMatch.Success)
      {
        // Add your code herefor Last Name
      }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM