简体   繁体   English

计算XML文件中有多少相等的字符串

[英]Count how many equal strings are in XML file

I'm wondering if there is a method to check how many equal strings are in an XML file. 我想知道是否有一种方法来检查XML文件中有多少相等的字符串。 For example this is the XML file: 例如,这是XML文件:

<Root>
  <task>
    <sub1>test</sub1>
    <sub2>hello</sub2>
    <sub3>csharp</sub3>
  </task>
  <task>
    <sub1>test2</sub1>
    <sub2>hello2</sub2>
    <sub3>csharp2</sub3>
  </task>
  <task>
    <sub1>test3</sub1>
    <sub2>hello3</sub2>
    <sub3>csharp3</sub3>
  </task>
  <task>
    <sub1>test</sub1>
    <sub2>hello4</sub2>
    <sub3>csharp4</sub3>
  </task>
</Root>

As you can see node.Innertext = "test" exists twice. 如您所见, node.Innertext = "test"存在两次。 I'm wondering how I can count that? 我想知道我怎么算这个? I tried something like 我试过类似的东西

client["sub1"].InnerText.Count

but this counts the number of character in this string. 但这会计算此字符串中的字符数。

Suggestions appreciated :) 建议赞赏:)

EDIT: I parse the XML file using XmlDocument 编辑:我使用XmlDocument解析XML文件

Select elements you want to check (eg all sub elements of all tasks) and group them by value: 选择要检查的元素(例如,所有任务的所有子元素)并按值分组:

xdoc.Root.Elements("task").SelectMany(t => t.Elements())
    .GroupBy(e => e.Value)
    .Select(g => new { Text = g.Key, Count = g.Count() })

Query syntax: 查询语法:

var xdoc = XDocument.Load(path_to_xml);
var result = from t in xdoc.Root.Elements("task")
             from e in t.Elements()
             group e by e.Value into g
             select new {
                  Text = g.Key,
                  Count = g.Count()
             };

With XPath: 使用XPath:

var result = from e in xdoc.XPathSelectElements("//task/*")
             group e by e.Value into g
             select new {
                 Text = g.Key,
                 Count = g.Count()
             };

For your sample xml result will be: 对于您的样本,xml结果将是:

[
  { Text: "test", Count: 2 },
  { Text: "hello", Count: 1 },
  { Text: "csharp", Count: 1 },
  { Text: "test2", Count: 1 },
  { Text: "hello2", Count: 1 },
  { Text: "csharp2", Count: 1 },
  { Text: "test3", Count: 1 },
  { Text: "hello3", Count: 1 },
  { Text: "csharp3", Count: 1 },
  { Text: "hello4", Count: 1 },
  { Text: "csharp4", Count: 1 }
]

You can filter results by count if you want to get only text which exist more than once: 如果您只想获取多次存在的文本,则可以按计数过滤结果:

 result.Where(x => x.Count > 1)

Same query for XmlDocument : XmlDocument相同查询:

var doc = new XmlDocument();
doc.Load(path_to_xml);
var result = from XmlNode n in doc.SelectNodes("//task/*")
             group n by n.InnerText into g
             select new {
                 Text = g.Key,
                 Count = g.Count()
             };
var dubs = XDocument.Parse(xml)
            .Descendants("task")
            .GroupBy(g => (string)g.Attribute("sub1"))
            .Where(g => g.Count() > 1)
            .Select(g => g.Key);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM