简体   繁体   English

C#字符串过滤和正则表达式匹配

[英]c# String filtering and split with regex match

I'm trying to to filter strings with regex but i'm not that familiar with regex so i need a little help. 我正在尝试使用正则表达式过滤字符串,但是我对正则表达式并不熟悉,因此我需要一点帮助。 Also i need to check if string contains specific regex part like example of input bellow: 我还需要检查字符串是否包含特定的正则表达式部分,例如输入波纹管的示例:

Input (string): 输入(字符串):

"<value1;127.0.0.1:20000;value2;value3>Lorem ipsum dolor sit amet!"

If exist return these values: 如果存在,请返回以下值:

string val1 = ????; //can't be null or empty, must be at least 3 chars/ints
string val2 = ????; //can be empty string
string val3 = ????; //can be empty string
string ipaddress = ????; // can't be empty
string text = ????; //can be empty string

Otherwise if not exist return only "lorem ipsum..." text: 否则,如果不存在,则仅返回“ lorem ipsum ...”文本:

string text = ????; //can be empty string

So first i need to check if that specific part 所以首先我需要检查特定部分 exist in full string. 以完整字符串存在。 String can be without that part. 字符串可以没有该部分。

Can please someone explain me how I can do that? 可以请人解释一下我该怎么做吗?

EDIT: (please don't judge, i'm really bad with regex) Here is what i tried: 编辑:( 请不要判断,我对正则表达式真的很不好)这是我尝试过的:

private static bool ifContain(string a)
{
    return Regex.IsMatch(a, @"([a-zA-Z0-9]*)\;([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\:[0-9]{5})([a-zA-Z0-9*)\;([a-zA-Z0-9]*)\;([<a-zA-Z0-9]*)");
}

The ugly way without Regex: without Regex的丑陋方式:

var str = "<value1;127.0.0.1:20000;value2;value3>Lorem ipsum dolor sit amet!";

var split = str.Split(';'); 
var split2 = split[3].Split('>');

var val1 = split[0].Split('<')[1];
var ip = split[1];
var val2 = split[2];    
var val3 = split2[0];
var text = split2[1];

if any of these values for some reason don't exist, you can check afterwards using the string.IsNullOrWhiteSpace() function 如果由于某种原因不存在这些值中的任何一个,则可以稍后使用string.IsNullOrWhiteSpace()函数进行检查

So, for instance: 因此,例如:

var str = "<;;;>";

var split = str.Split(';');

var val1 = split[0].Split('<')[1];

Console.WriteLine(string.IsNullOrWhiteSpace(val1)); //true

Here's a way combining regex and split. 这是将正则表达式和拆分结合在一起的方法。 I did not do any null/empty/whitespace validation, nor did I validate that the split results contain at least 4 elements. 我没有执行任何null / empty / whitespace验证,也没有验证split结果至少包含4个元素。 This uses capture groups to select text from within the match, and it just blindly grabs all text it can in the groups using .* , where the groups are defined with the parenthesis. 这使用捕获组从匹配项中选择文本,它只是使用.*盲目地抓取组中所有可能的文本,其中用括号定义了组。

        string txt = "<value1;127.0.0.1:20000;value2;value3>Lorem ipsum dolor sit amet!";
        var rgx = new Regex(@"<(.*)>(.*)");
        var match = rgx.Match(txt);
        // Should check if (match.Success) here and only continue if true
        var entireMatch = match.Groups[0]; // unused
        var firstCaptureGroup = match.Groups[1].Value; // Everything between < >
        var secondCaptureGroup = match.Groups[2].Value; // Everything after < >
        var split = firstCaptureGroup.Split(';');

        string val1 = split[0]; 
        string val2 = split[2]; 
        string val3 = split[3]; 
        string ipaddress = split[1]; 
        string text = secondCaptureGroup;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM