用正则表达式用line_2替换每次出现的“ line 2”

Question

I'm parsing some text from an XML file which has sentences like "Subtract line 4 from line 1.", "Enter the amount from line 5" i want to replace all occurrences of line with line_ eg. 我正在解析XML文件中的一些文本，该文件具有类似“从第1行减去第4行”，“从第5行输入金额”之类的句子，我想用line_替换所有出现的line。 Subtract line 4 from line 1 --> Subtract line_4 from line_1 从第1行减去第4行->从第_1行减去第4行

Also, there are sentences like "Are the amounts on lines 4 and 8 the same?" 另外，还有这样的句子：“第4行和第8行的金额是否相同？” and "Skip lines 9 through 12; go to line 13." 和“跳过第9至12行；转到第13行。” I want to process these sentences to become "Are the amounts on line_4 and line_8 the same?" 我想将这些句子处理为“ line_4和line_8上的金额是否相同？” and "Skip line_9 through line_12; go to line_13." 和“跳过line_9到line_12；转到line_13。”

Answer 1

Here's a working implementation with rspec test. 这是rspec测试的可行实现。 You call it like this: output = LineIdentifier[input] . 您可以这样称呼： output = LineIdentifier[input] 。 To test, spec file.rb after installing rspec gem. 要进行测试， spec file.rb在安装rspec gem之后使用spec file.rb 。

require 'spec'

class LineIdentifier
  def self.[](input)
    output = input.gsub /line (\d+)/, 'line_\1'
    output.gsub /lines (\d+) (and|from|through) (line )?(\d+)/, 'line_\1 \2 line_\4'
  end
end

describe "LineIdentifier" do
  it "should identify line mentions" do
    examples = { 
      #Input                                         Output
     'Subtract line 4 from line 1.'               => 'Subtract line_4 from line_1.',
     'Enter the amount from line 5'               => 'Enter the amount from line_5',
     'Subtract line 4 from line 1'                => 'Subtract line_4 from line_1',
    }
    examples.each do |input, output|
      LineIdentifier[input].should == output
    end
  end
  it "should identify line ranges" do
    examples = { 
      #Input                                         Output
     'Are the amounts on lines 4 and 8 the same?' => 'Are the amounts on line_4 and line_8 the same?',
     'Skip lines 9 through 12; go to line 13.'    => 'Skip line_9 through line_12; go to line_13.',
    }
    examples.each do |input, output|
      LineIdentifier[input].should == output
    end
  end
end

Answer 2

This works for the specific examples including the ones in the OP comments. 这适用于特定示例，包括OP注释中的示例。 As is often the case when using regex to do parsing, it becomes a hodge-podge of additional cases and tests to handle ever-increasing known inputs. 就像使用正则表达式进行解析的情况一样，它成为其他情况和测试的大杂烩，以处理不断增长的已知输入。 This handles the lists of line numbers using a while loop with a non-greedy match. 这使用带有非贪婪匹配的while循环来处理行号列表。 As written, it is simply processing an input line-by-line. 如所写，它只是在逐行处理输入。 To get series of line numbers across line boundaries, it would need to be changed to process it as one chunk with matching across lines. 要获得跨线边界的一系列线号，需要对其进行更改以将其处理为一个跨线匹配的块。

open( ARGV[0], "r" ) do |file|
  while ( line = file.gets )
    # replace both "line ddd" and "lines ddd" with line_ddd 
    line.gsub!( /(lines?\s)(\d+)/, 'line_\2' )
    # Now replace the known sequences with a non-greedy match
    while line.gsub!( /(line_\d+[a-z]?,?)(\sand\s|\sthrough\s|,\s)(\d+)/, '\1\2line_\3' )
    end
    puts line
  end
end

Sample Data : For this input: 样本数据 ：对于此输入：

Subtract line 4 from line 1.
Enter the amount from line 5
on lines 4 and 8 the same?
Skip lines 9 through 12; go to line 13.
... on line 10 Form 1040A, lines 7, 8a, 9a, 10, 11b, 12b, and 13
Add lines 2, 3, and 4

It produces this output: 它产生以下输出：

Subtract line_4 from line_1.
Enter the amount from line_5
on line_4 and line_8 the same?
Skip line_9 through line_12; go to line_13.
... on line_10 Form 1040A, line_7, line_8a, line_9a, line_10, line_11b, line_12b, and line_13
Add line_2, line_3, and line_4

Answer 3

sed is your friend: sed是你的朋友：

lines.sed : lines.sed ：

#!/bin/sed -rf
s/lines? ([0-9]+)/line_\1/g
s/\b([0-9]+[a-z]?)\b/line_\1/g

lines.txt : lines.txt ：

Subtract line 4 from line 1.
Enter the amount from line 5
Are the amounts on lines 4 and 8 the same?
Skip lines 9 through 12; go to line 13.
Enter the total of the amounts from Form 1040A, lines 7, 8a, 9a, 10, 11b, 12b, and 13
Add lines 2, 3, and 4

demo: 演示：

$ cat lines.txt | ./lines.sed
Subtract line_4 from line_1.
Enter the amount from line_5
Are the amounts on line_4 and line_8 the same?
Skip line_9 through line_12; go to line_13.
Enter the total of the amounts from Form 1040A, line_7, line_8a, line_9a, line_10, line_11b, line_12b, and line_13
Add line_2, line_3, and line_4

You can also make this into a sed one-liner if you prefer, although the file is more maintainable. 您也可以根据需要将其制成sed单线格式，尽管该文件更易于维护。

用正则表达式用line_2替换每次出现的“ line 2”

问题描述

3 个解决方案

解决方案1
2 2010-08-24 23:22:55

解决方案2
0 2010-08-24 21:26:57

解决方案3
0 2010-08-31 16:25:48

用正则表达式用line_2替换每次出现的“ line 2”

问题描述

3 个解决方案

解决方案1 2 2010-08-24 23:22:55

解决方案2 0 2010-08-24 21:26:57

解决方案3 0 2010-08-31 16:25:48

解决方案1
2 2010-08-24 23:22:55

解决方案2
0 2010-08-24 21:26:57

解决方案3
0 2010-08-31 16:25:48