简体   繁体   English

在c#中通过正则表达式提取多个字符串

[英]Extract multiple strings via Regex in c#

Based on this question, I want to extract multiple data points from a text file.基于这个问题,我想从一个文本文件中提取多个数据点。 The text file is basically a C# string with a Key: Value scheme.文本文件基本上是一个带有键:值方案的 C# string Each key-value pair is on one line.每个键值对都在一行上。 I created the following code:我创建了以下代码:

var matches = Regex.Matches(
    pageText,
    "^Ort Lieferadresse: (?<deliveryAdress>.*)$|^Referenz: (?<reference>.*)$|^Lademittel: (?<loading>.*)$|^Plombennummer: (?<plombe>.*)$|^Bemerkung: (?<remarks>.*)$",
    RegexOptions.Multiline);

This works, however I have my problems to extract the actual captures, because it returns five matches.这可行,但是我在提取实际捕获时遇到了问题,因为它返回五个匹配项。 I also tried to use the Match method, but all but the very first group are getting found.我也尝试使用Match方法,但除了第一组之外的所有其他人都被找到了。

Is there any way to return all captures in one go?有没有办法一次性返回所有捕获?

Here's some sample data:以下是一些示例数据:

Verladeplan
Erzeugt von: moehlerj
Erzeugt am: 01.03.2022
Ausliefertermin: 03.03.2022-01:00:00
Darstellung Verladeplan
Ladeinformationen
Ausliefertermin: 03.03.2022-01:00:00
Ort Lieferadresse: Foo
Referenz: Bar
Lademittel: 40' Container
Lademeter: 1176.000
Gesamtanzahl Paletten: 24
Gesamtbruttogewicht: 6669.0
Plombennummer: keine, da Abholung im LKW
Bemerkung: Kennzeichen: AB12345 / CD67890
Containernummer:
TARA:
Seite 1 von 2

You can use您可以使用

var text = "Ort Lieferadresse: deliveryAdress\r\nReferenz: reference\r\nLademittel: loading\r\nPlombennummer: plombe\r\nBemerkung: remarks";
var pattern = @"^Ort Lieferadresse: (?<deliveryAdress>[^\r\n]*)\r?$|^Referenz: (?<reference>[^\r\n]*)\r?$|^Lademittel: (?<loading>[^\r\n]*)\r?$|^Plombennummer: (?<plombe>[^\r\n]*)\r?$|^Bemerkung: (?<remarks>[^\r\n]*)\r?$";
var results = Regex.Matches(text, pattern, RegexOptions.Multiline)
        .Cast<Match>()
        .SelectMany(m => m.Groups.Skip(1))
        .Where(n => n.Success);
foreach (Group grp in results)
    Console.WriteLine("{0}: {1}", grp.Name, grp.Value);

See the C# demo yielding查看C# 演示

deliveryAdress: deliveryAdress
reference: reference
loading: loading
plombe: plombe
remarks: remarks

First of all, to support the CRLF line endings and bearing in mind the .首先,要支持 CRLF 行尾并牢记. meaning in a .NET regex, I suggest replacing .* with [^\r\n]* and adding an optional CR pattern ( \r? ) before the $ end of line anchor.在 .NET 正则表达式中的含义,我建议将.*替换为[^\r\n]*并在$行尾锚之前添加可选的 CR 模式( \r? )。

Then, .Cast<Match>() gets a list of all match objects returned by the Regex.Matches(text, pattern, RegexOptions.Multiline) , the .SelectMany(m => m.Groups.Skip(1)) gets the Groups property of each match object without the zeroth item (it is the whole match that we do not need), and .Where(n => n.Success) will only keep the groups that participated in the match.然后, .Cast<Match>()获取Regex.Matches(text, pattern, RegexOptions.Multiline)返回的所有匹配对象的列表, .SelectMany(m => m.Groups.Skip(1))获取每个匹配对象的Groups属性没有第零项(这是我们不需要的整个匹配),并且.Where(n => n.Success)将只保留参与匹配的组。

If a specific order is defined for these lines, then replace the regex OR |如果为这些行定义了特定顺序,则替换正则表达式 OR | by regex new-line \n .通过正则表达式换行\n You can also drop the beginning and end-of lines ^ and $ around the \n :您还可以在\n周围放置^$行的开头和结尾:

var match = Regex.Match(
    pageText,
    @"^Ort Lieferadresse: (?<deliveryAdress>.*)\r\nReferenz: (?<reference>.*)\r\nLademittel: (?<loading>.*)\r(.|\r|\n)*\nPlombennummer: (?<plombe>.*)\r\nBemerkung: (?<remarks>.*)$",
    RegexOptions.Multiline);

// Test
if (match.Success) {
    Console.WriteLine(match.Groups["deliveryAdress"].Value);
    Console.WriteLine(match.Groups["reference"].Value);
    Console.WriteLine(match.Groups["loading"].Value);
    Console.WriteLine(match.Groups["plombe"].Value);
    Console.WriteLine(match.Groups["remarks"].Value);
} else {
    Console.WriteLine("no match");
}

This makes it find one single match.这使它找到一个匹配项。


If the information can appear in any order, I suggest not using regex at all and to load the file using:如果信息可以按任何顺序出现,我建议根本不使用正则表达式并使用以下方法加载文件:

IEnumerable<string> lines = File.ReadLines(path);

Then, insert the information into a dictionary.然后,将信息插入字典。 This allows you to access the desired data easily.这使您可以轻松访问所需的数据。 Also, the dictionary contains automatically all available tags.此外,字典自动包含所有可用的标签。

var dict = lines
    .Select(l => (text: l, index: l.IndexOf(": ")))
    .Where(t => t.index > 0)
    .Select(t => (key: t.text[0..t.index], value: t.text[(t.index + 2)..]))
    .DistinctBy(kv => kv.key) // Because Ausliefertermin occurrs twice
    .ToDictionary(kv => kv.key, kv => kv.value);

This test本次测试

Console.WriteLine($"Ort Lieferadresse = {dict["Ort Lieferadresse"]}");
Console.WriteLine($"Referenz = {dict["Referenz"]}");
Console.WriteLine($"Lademittel = {dict["Lademittel"]}");
Console.WriteLine($"Plombennummer = {dict["Plombennummer"]}");
Console.WriteLine($"Bemerkung = {dict["Bemerkung"]}");

yields产量

Ort Lieferadresse = Foo
Referenz = Bar
Lademittel = 40' Container
Plombennummer = keine, da Abholung im LKW
Bemerkung = Kennzeichen: AB12345 / CD67890

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM