[英]Best way to parse a string into Dictionary of terms
輸入-字符串: "TAG1xxxTAG2yyyTAG3zzzTAG1tttTAG1bbb"
預期結果:對TAG1 = {xxx,,ttt,bbb}, TAG2 = {yyy}, TAG3 = {zzz}.
我是使用regexps做到的,但是使用Regex.Replace而不使用返回值確實讓我感到困惑。 我想改進此代碼,那么如何實現呢?
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace TermsTest
{
class Program
{
static void Main(string[] args)
{
string[] tags = { "TAG1", "TAG2", "TAG3", "TAG4", "TAG5", "TAG6", "TAG7", "TAG8" };
string file = "TAG2jjfjfjndbfdjTAG1qqqqqqqTAG3uytygh fhdjdfTAG5hgjdhfghTAG6trgfmxc hdfhdTAG2jfksksdhjskTAG3kdjbjvbsjTAG2jskjdjdvjvbxjkvbjdTAG2jkxcndjcjbkjn";
string tag = "(" + string.Join("|", tags) + ")";
var dictionary = new Dictionary<string, List<string>>(tags.Length);
Regex.Replace(file, string.Format(@"({0})(.+?)(?={0}|$)", tag), match =>
{
string key = match.Groups[1].Value, value = match.Groups[3].Value;
if (dictionary.ContainsKey(key))
dictionary[key].Add(value);
else
dictionary[key] = new List<string> {value};
return "";
});
foreach (var pair in dictionary)
{
Console.Write(pair.Key + " =\t");
foreach (var entry in pair.Value)
{
Console.Write(entry + " ");
}
Console.WriteLine();
Console.WriteLine();
}
}
}
}
string input = "TAG1xxxTAG2yyyTAG3zzzTAG1tttTAG1bbb";
var lookup = Regex.Matches(input, @"(TAG\d)(.+?)(?=TAG|$)")
.Cast<Match>()
.ToLookup(m => m.Groups[1].Value, m => m.Groups[2].Value);
foreach (var kv in lookup)
{
Console.WriteLine(kv.Key + " => " + String.Join(", ", kv));
}
輸出:
TAG1 => xxx, ttt, bbb
TAG2 => yyy
TAG3 => zzz
這對於.NET CaptureCollection對象是一項完美的工作-一種獨特的.NET功能,使您可以多次重用同一捕獲組。
使用此正則表達式並使用Matches
創建MatchCollection:
(?:TAG1(.*?(?=TAG|$)))?(?:TAG2(.*?(?=TAG|$)))?(?:TAG3(.*?(?=TAG|$)))?
然后檢查捕獲:
Groups[1].Captures
將包含所有TAG1 Groups[2].Captures
將包含所有TAG2 Groups[3].Captures
將包含所有TAG3 從那里到您的最終數據結構僅一步之遙。
為了減少回溯的可能性,可以使令牌成為原子的:
(?>(?:TAG1(.*?(?=TAG|$)))?)(?>(?:TAG2(.*?(?=TAG|$)))?)(?>(?:TAG3(.*?(?=TAG|$)))?)
有關其工作原理的詳細信息,請參見“ 可以量化的捕獲組” 。
您想要做的只是將相同標記的值進行分組,因此使用GroupBy
方法應該更容易:
string input = "TAG1xxxTAG2yyyTAG3zzzTAG1tttTAG1bbb";
var list = Regex.Matches(input, @"(TAG\d+)(.+?)(?=TAG\d+|$)")
.Cast<Match>()
.GroupBy(m => m.Groups[1].Value,
(key, values) => string.Format("{0} = {{{1}}}",
key,
string.Join(", ",
values.Select(v => v.Groups[2]))));
var output = string.Join(", ", list);
這產生作為output
字符串"TAG1 = {xxx, ttt, bbb}, TAG2 = {yyy}, TAG3 = {zzz}"
我不確定我是否知道您在此問題上的所有假設和慣例; 但這給了我類似的結果:
var tagColl = string.Join("|", tags);
var tagGroup = string.Format("(?<tag>{0})(?<val>[a-z]*)", tagColl);
var result = from x in Regex.Matches(file, tagGroup).Cast<Match>()
where x.Success
let pair = new { fst = x.Groups["tag"].Value, snd = x.Groups["val"].Value }
group pair by pair.fst into g
select g;
一個簡單的測試將是:
Console.WriteLine(string.Join("\r\n", from g in result
let coll = string.Join(", ", from item in g select item.snd)
select string.Format("{0}: {{{1}}}", g.Key, coll)));
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.