简体   繁体   English

正则表达式,C#

[英]Regular Expressions, C#

I have a large X12 EDI file with many description strings (1000s). 我有一个大型X12 EDI文件,其中包含许多描述字符串(1000s)。 These description strings can be found before, after and between other strings that have the same delimiter of *. 这些描述字符串可以在带有相同分隔符*的其他字符串之前,之后和之间找到。

All description strings start with the tag REF*TC** and end with the character ~ 所有描述字符串均以标签REF * TC **开头,并以字符〜结尾。

I need to find and replace all * that occur between these two tags, without touching the other strings, in this example the DTM string. 我需要查找并替换出现在这两个标记之间的所有*,而不必触摸其他字符串,在本例中为DTM字符串。

I am including an example of two description strings as they would be found in the file. 我将提供两个描述字符串的示例,因为它们会在文件中找到。 As you can see, the first description string contains the * that I'm needing to replace, the second description string doesn't contain any * that are needing to be replaced. 如您所见,第一个描述字符串包含我需要替换的*,第二个描述字符串不包含任何需要替换的*。

~REF*TC**BLAH*BLAH*~REF*TC**BLAHBLAH~REF*TC***BLAH~DTM*010*20110329~

desired output: 所需的输出:

~REF*TC**BLAHBLAH~REF*TC**BLAHBLAH~REF*TC**BLAH~DTM*010*20110329~

I am using C# 我正在使用C#

This is what I have so far. 到目前为止,这就是我所拥有的。

find expression: REF*TC**(.{0,}?)(*+)(.{0,}?)(**)(.{0,}?)(**)~ 查找表达式:REF * TC **(。{0,}?)(* +)(。{0,}?)(**)(。{0,}?)(**)〜

Here's what I've come up with: 这是我想出的:

var str = "~REF*TC**BLAH*BLAH*~REF*TC**BLAHBLAH~REF*TC***BLAH~DTM*010*20110329~";
var result = (new Regex(@"(?<pre>REF\*TC\*\*)(?<text>.*?)(?<post>~)")).Replace(str,(m) =>
{
    return String.Join(String.Empty,new String[]{
        m.Groups["pre"].Value,
        m.Groups["text"].Value.Replace("*",String.Empty),
        m.Groups["post"].Value
    });
});

DEMO 演示

That's just based on what you've provided, not 100% sure what you're going for though, to be honest. 坦白说,这只是基于您提供的内容,而不是100%确定您要做什么。

Regex is awesome, but as the famous quote goes, Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. 正则表达式很棒,但正如名言所说, Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. Skip the regex and just use string methods on it. 跳过正则表达式,仅在其上使用字符串方法即可。 You could go as simple as splitting it on the REF*TC** start tags and then replacing all the * characters, or you could try for something more sophisticated. 您可以像在REF*TC**起始标签上将其分割然后替换所有*字符一样简单,或者可以尝试更复杂的东西。 Don't go all the way for regex when simple string methods will do. 当简单的字符串方法可以使用正则表达式时,请不要一路走。

EDIT: 编辑:

Here's a real simple example: 这是一个简单的例子:

string[] lines = file.Split("REF*TC**");
for(int i=0;i<lines.Length;i++)
{
    lines[i] = lines[i].Replace("*", "");
}
string output = string.Join("REF*TC**", lines);

You may have to clean up an extra "REF*TC**" at the end, I don't remember exactly how Join() handles it. 您可能必须在末尾清理额外的“ REF * TC **”,我不记得确切地Join()如何处理它。 Anyways, that should do it. 无论如何,应该这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM