简体   繁体   English

使用正则表达式提取双引号之间的值

[英]Extract the values between the double quotes using regex

string emailBody = "sample text for NewFinancial History:\"xyz\"  text NewFinancial History:\"abc\"  NewEBTDI$:\"abc\" ds \"NewFinancial History:pqr\" test";

private Dictionary<string, List<string>> ExtractFieldValuesForDynamicListObject(string emailBody)
{
    Dictionary<string, List<string>> paramValueList = new Dictionary<string, List<string>>();
    try
    {
        emailBody = ReplaceIncompatableQuotes(emailBody);
        emailBody = string.Join(" ", Regex.Split(emailBody.Trim(), @"(?:\r\n|\n|\r)"));
        var keys = Regex.Matches(emailBody, @"\bNew\B(.+?):", RegexOptions.Singleline).OfType<Match>().Select(m => m.Groups[0].Value.Replace(":", "")).Distinct().ToArray();
        foreach (string key in keys)
        {
            List<string> valueList = new List<string>();
            string regex = "" + Regex.Escape(key) + ":" + "\"(?<" + Regex.Escape(GetCleanKey(key)) + ">[^\"]*)\"";

            var matches = Regex.Matches(emailBody, regex, RegexOptions.Singleline);
            foreach (Match match in matches)
            {
                if (match.Success)
                {
                    string value = match.Groups[Regex.Escape(GetCleanKey(key))].Value;
                    if (!valueList.Contains(value.Trim()))
                    {
                        valueList.Add(value.Trim());
                    }
                }
            }
            valueList = valueList.Distinct().ToList();
            string listName = key.Replace("New", "");                    
            paramValueList.Add(listName.Trim(), valueList);
        }
    }
    catch (Exception ex)
    {
        DCULSLogger.LogError(ex);
    }
    return paramValueList;
}

My goal here is to scan though the email body and identify the string with NewListName:"Value" nomenclature and it is working perfectly fine using the above regex and method.我的目标是扫描电子邮件正文并使用NewListName:"Value"命名法识别字符串,并且使用上述正则表达式和方法可以正常工作。 Now my client has changed the nomenclature from NewListName:"Value" to "NewListName:Value" .现在我的客户已将命名法从NewListName:"Value"更改为"NewListName:Value" I want to grab the text between the double quotes along with New: keyword.我想抓取双引号之间的文本以及New:关键字。 So I need to look for "New keyword and ending quotes. Can anyone help me modify the above regex to scan through the email body and get all list of value between double quotes. So in above example I want to grab \\"NewFinancial History:pqr\\" in my results. Any help would be appreciated.所以我需要寻找"New关键字和结束引号。谁能帮我修改上面的正则表达式来扫描电子邮件正文并获取双引号之间的所有值列表。所以在上面的例子中我想获取\\"NewFinancial History:pqr\\"在我的结果中。任何帮助将不胜感激。

You may use a regex that will match quote, New , some chars other than " and : , then : , and then any chars but " up to a " :您可以使用一个正则表达式来匹配引号、 New 、除":之外的一些字符,然后是: ,然后是除" up to a "任何字符:

var keys = Regex.Matches(emailBody, @"""New[^"":]+:[^""]+""", RegexOptions.Singleline)
       .OfType<Match>()
       .Select(m => m.Value)
       .Distinct()
       .ToArray();

See the regex demo查看正则表达式演示

在此处输入图片说明

Pattern details :图案详情

  • " - a literal double quote " - 字面双引号
  • New - a literal substring New - 文字子串
  • [^":]+ - 1 or more characters other than " and : (the [^...] is a negated character class ) [^":]+ - 除了":之外的 1 个或多个字符( [^...]否定字符类
  • : - a literal colon : - 文字冒号
  • [^"]+ - 1 or more characters other than " [^"]+ - 1 个或多个"以外的字符
  • " - a literal double quote " - 字面双引号

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM