简体   繁体   中英

Using Regex.Split in c#

So i have a string that contains snowflake columns and i want to split the string to each column, I'm trying to use Regex to do this as it split string won't work in this situation. The string pattern i have tried is

string pattern = @"([^\s]*\s[^\s]*),"

though this pattern splits after the second consecutive space. Im not sure how to split it just after the alias. I am also using .net core 3.1. Any help would be appreciated..

current snowflake datacolumn string:

string columns = "nvl(u.\"Country\",'#N/A') \"Country\",u.\"CreatedDate\" \"CreatedDate\",nvl(u.\"Email\",'#N/A') \"Email\",u.\"LastModifiedDate\" \"LastModifiedDate\",nvl(u.\"Name\",'#N/A') \"Name\"";

expected output:
nvl(u."Country",'#N/A') "Country"
u."CreatedDate" "CreatedDate"
nvl(u."Email",'#N/A') "Email"
u."LastModifiedDate" "LastModifiedDate"
nvl(u."Name",'#N/A') "Name"

You can use

string[] result = Regex.Split(text, @"(?<=\s""\w+""),");

See the .NET regex demo . Details :

  • (?<=\s"\w+") - a positive lookbehind that matches a location immediately preceded with a whitespace, " , one or more word chars, "
  • , - a comma.

在此处输入图像描述

Another idea is to extract the matches with

var result = Regex.Matches(text, @"\b(?:nvl\([^()]*\)|u\.""[^""]*"")\s+""[^""]*""")
    .Cast<Match>()
    .Select(x => x.Value);

See this regex demo .

在此处输入图像描述

Details :

  • \b - word boundary
  • (?:nvl\([^()]*\)|u\."[^"]*") - nvl(...) or u."..."
  • \s+ - one or more whitespaces
  • "[^"]*" - " , zero or more non- " s, and a " .

You can use a capture group (group 1) and exclude the comma in the second part after matching the space. To match all parts, you can match either a comma or the end of the string at the end of the pattern.

This part [^\s]* can be written as \S*

(\S*\s[^\s,]*)(?:,|$)
  • ( Capture group 1
    • \S*\s[^\s,]* Match optional non whitespace chars, match a whitespace char and match optional non whitespace chars except a comme
  • ) Close group 1
  • (?:,|$) Match either a comma or assert the end of the string

.NET regex demo

在此处输入图像描述

For example

string pattern = @"(\S*\s[^\s,]*)(?:,|$)";
string input = @"nvl(u.""Country"",'#N/A') ""Country"",u.""CreatedDate"" ""CreatedDate"",nvl(u.""Email"",'#N/A') ""Email"",u.""LastModifiedDate"" ""LastModifiedDate"",nvl(u.""Name"",'#N/A') ""Name""";

foreach (Match m in Regex.Matches(input, pattern))
{
    Console.WriteLine(m.Groups[1].Value);
}

Output

nvl(u."Country",'#N/A') "Country"
u."CreatedDate" "CreatedDate"
nvl(u."Email",'#N/A') "Email"
u."LastModifiedDate" "LastModifiedDate"
nvl(u."Name",'#N/A') "Name"

A bit more specific pattern using + to match 1 or more characters and match word characters between double quotes:

 (\S+\s"\w+")(?:,|$)

Regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM