简体   繁体   English

如何从C#中的匹配字符串中提取数字?

[英]How to extract the number from a matched string in C#?

I want to extract emoji id from the input. 我想从输入中提取表情符号ID。

For example, inputs: 例如,输入:

`<eid=1>  valid get 1`
`<eid = >  invalid `
`<exd = 1>  invalid` 
`< eid = 1000> valid get 1000`

I know how to match those string, but I have no idea about how to extract those ids from the matched strings. 我知道如何匹配那些字符串,但是我不知道如何从匹配的字符串中提取那些id。

Use regex 使用正则表达式

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] inputs = {
                                  "<eid=1>",
                                  "<eid = >",
                                  "<exd = 1>", 
                                  "< eid = 1000>"
                              };
            string pattern = @"\<\s*eid\s*=\s*(?'number'\d+)\s*\>";

            foreach (string input in inputs)
            {
                Match  match = Regex.Match(input, pattern);
                if (match.Success)
                {
                    Console.WriteLine("input : '{0}' Does Match, number = '{1}'", input, match.Groups["number"]);
                }
                else
                {
                    Console.WriteLine("input : '{0}' Does not Match", input);
                }
            }
            Console.ReadLine();
        }
    }
}

You can do something like this. 你可以做这样的事情。 If you don't want to store each item in an array (ex. you have html code), you can store all the values as one string, as use the following: 如果您不想将每个项目存储在一个数组中(例如,您有html代码),则可以将所有值存储为一个字符串,方法如下:

var input = @"`<eid=1>  valid get 1`
              `<eid = >  invalid `
              `<exd = 1>  invalid` 
              `< eid = 1000> valid get 1000`";
var regex = new Regex(@"(?<open>\=).*?(?<final-open>\>)");
var matches = regex.Matches(input).Cast<Match>().Select(m => m.Groups["final"].Value).Distinct().ToList();

foreach (var match in matches)
{
    // here you have all the matches
    var id = int.Parse(match.Trim());
}

This method sets the opening and closing tags of the matches you want where this is the open tag '\\=' and this is the closing tag '>': 此方法设置所需匹配项的开始和结束标记,其中开始标记为“ \\ =”,结束标记为“>”:

var regex = new Regex(@"(?<open>\=).*?(?<final-open>\>)");

You need to understand what is a match, what is a capture and how can one do match captures of specific data. 您需要了解什么是匹配项,什么是捕获项以及如何匹配特定数据的捕获项。


In the realm of regular expressions there is a difference between a match and a capture and basic grouping . 在正则表达式领域中, 匹配捕获和基本分组之间存在差异。

You want to match the whole value <eid=8> but you want to get the value 8 into a capture . 您想匹配整个值<eid=8>但是想要将值8 捕获到一个捕获中 That is done by adding a grouping ( ) pattern to establish 1 to many capture groups. 这可以通过添加分组( )模式来为多个捕获组建立1来完成。 For a match can hold one or more groupings which are indexed starting at 1 to N. Zero is a special group done automatically and explained later. 对于一场比赛,可以容纳一个或多个从1到N开始索引的分组。零是一个自动完成的特殊分组,稍后将进行说明。


So for the data <eid=8> , to group capture the value use this regex <\\w+=(\\d+)\\> (instead of the viable pattern <\\w+=\\d+\\> ). 因此,对于数据<eid=8> ,要分组捕获值,请使用此正则表达式<\\w+=(\\d+)\\> (而不是可行的模式<\\w+=\\d+\\> )。 The grouping is what puts the number into the match capture group of 1 with a value of 8 . 分组是什么使成数的匹配捕获组1具有值8

So what are groups exactly? 那么什么是团体呢?

  • Groups[0] is always the whole match such as what you see of <eid=8> . Groups[0] 始终整个匹配项,例如您在<eid=8>看到的内容。
  • Groups[1-N] are individual captures when ( ) construct is specified. 当指定( )构造时, Groups[1-N]是单个捕获。 So for our example Groups[1].Value is the number of 8 . 因此,对于我们的示例Groups[1].Value8的数量。 Nice, that answers your question. 很好,可以回答您的问题。
  • One can do a named match capture by putting in (<?<{name here}>... ) . 可以通过放入(<?<{name here}>... )来进行命名匹配捕获。 By that logic we can change our pattern to <\\w+=(?<TheNumbers>\\d+)\\> and we then can extract with Groups["TheNumbers"].Value or even Groups[1].Value still. 通过这种逻辑,我们可以将模式更改为<\\w+=(?<TheNumbers>\\d+)\\> ,然后可以使用Groups["TheNumbers"].Value甚至Groups[1].Value进行提取。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM