繁体   English   中英

如何使用正则表达式将String转换为不同的子字符串?

[英]How can I use Regular Expression to convert String to different substring?

我有一个包含行的文本文件,类似于这些

000001 , Line 1 of text , customer 1 name
000002 , Line 2 of text , customer 2 name
000003 , Line 3 of text , customer 3 name
  =               =             =
  =               =             =
  =               =             =
000087 , Line 87 of text, customer 87 name
  =               =             =
  =               =             =
001327 , Line 1327 of text, customer 1327 name
  =               =             =
  =               =             =
  =               =             =

我可以编写一个程序来读取上面文件的每一行,将其转换为以下格式:

000001 , 1st Line , 1st Customer name
000002 , 2nd Line , 2nd Customer name
000003 , 3rd Line , 3rd Customer name
  =               =        =
  =               =        =
  =               =        =
000087 , 87th Line, 87th Customer name
  =               =        =
  =               =        =
001327 , 1327th Line, 1327th Customer name
  =               =        =
  =               =        =
  =               =        =

我的问题:是否有一种直接的方法来使用正则表达式实现相同的输出?

在此输入图像描述

我尝试了以下方法:

Dim pattern As String = "(\d{6}) , (Line \d+ of text) , (customer \d name)" 
Dim replacement As String = " $1 , $2 Line , $3 Customer name " 
Dim rgx As New Regex(pattern)
Dim result As String = rgx.Replace(my_input_file, replacement)

但结果远不是理想的输出。

请帮忙

你的正则表达式捕获太多了。 这些组应仅捕获数字:

Dim pattern As String = "(\d{6}) , Line (\d+) of text , customer (\d+) name"

另外,由于你想用序数替换数字,你应该使用String.Format进行格式化(逐行):

Dim match as Match = rgx.match(my_input_file_line)
Dim outputLine as String = String.Format(" {0} , {1} Line , {2} Customer name", _
    m.Groups(1).Value, GetOrdinal(m.Groups(2).Value), GetOrdinal(m.Groups(3).Value))

其中GetOrdinal是一个将数字字符串更改为序数的方法。

你的匹配组很大。 你想要匹配的是数字。


替换(\\d{6}) , Line (\\d+) of text , customer (\\d+) name
$1 , $2th Line , $3th Customer name


然后用1st替换1th


然后更换2th2nd


然后将3th 3rd替换为3th 3rd


我不知道你是否有意与真正的角质名称相匹配并以另一种顺序替换它......是吗?


然后你可以使用(使用全局和多行标志)

^(\\d{6}) , Line (\\d+) of text , ([^ ]+) (\\d) ([^ ]+)$

并以$1 , $2th Line , $4th $3 $5替换


提示:我总是使用http://www.gskinner.com/RegExr/来测试我的模式并进行实验!

是否有使用正则表达式的原因? 也许我误解了这个要求,但它似乎是一个修复格式,只有第一部分很重要,所以你可以使用这个简单的查询:

IEnumerable<string> lines = File.ReadLines(@"folder\input_text.txt");
IEnumerable<string> result = lines
.Where(l => l.Trim().Length > 0)
.Select(l => int.Parse(l.Split(',').First().Trim()))
.Select(num => string.Format("{0} , {1} Line , {1} Customer name"
    , num.ToString("D6")
    , num + (num == 1 ? "st" : num == 2 ? "nd" : "rd")));

您可以使用File.WriteAllLines将结果写入输出文件:

File.WriteAllLines(@"folder\desired_output.txt", result);

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM