简体   繁体   English

RegEx代码在理论上有效,但在运行代码时无效

[英]RegEx code works in theory but not when code is run

i'm trying to use this RegEx search: <div class="ms3">(\\n.*?)+< in Ruby, however as soon as i get to the last character "<" it stops working altogether. 我正在尝试使用此RegEx搜索:Ruby中的<div class="ms3">(\\n.*?)+< ,但是,一旦我到达最后一个字符“ <”,它就会完全停止工作。 I've tested it in Rubular and the RegEx works perfectly fine, I'm using rubymine to write my code but i also tested it using Powershell and it comes up with the same results. 我已经在Rubular中对其进行了测试,而RegEx可以正常工作,我使用rubymine编写了我的代码,但是我也使用Powershell对它进行了测试,结果也相同。 no Error message. 否错误消息。 when i run <div class="ms3">(\\n.*?)+ it prints <div class="ms3"> which is exactly what i'm looking for, but as soon as i add the "<" it comes out with nothing. 当我运行<div class="ms3">(\\n.*?)+它会打印<div class="ms3">这正是我要查找的内容,但是只要我添加了“ <”一无所获。

my code: 我的代码:

#!/usr/bin/ruby
# encoding: utf-8

File.open('ms3.txt', 'w') do |fo|
  fo.puts File.foreach('input.txt').grep(/<div class="ms3">(\n.*?)+/)
end

some of what i'm searching through: 我正在搜索的一些内容:

  <div class="ms3">
    <span xml:lang="zxx"><span xml:lang="zxx">Still the tone of the remainder of the chapter is bleak. The</span> <span class="See_In_Glossary" xml:lang="zxx">DAY OF THE <span class="Name_Of_God" xml:lang="zxx">LORD</span></span> <span xml:lang="zxx">holds no hope for deliverance (5.16–18); the futility of offering sacrifices unmatched by common justice is once more underlined, and exile seems certain (5.21–27).</span></span>
  </div>

  <div class="Paragraph">
    <span class="Verse_Number" id="idAMO_5_1" xml:lang="zxx">1</span><span class="scrText">Listen, people of Israel, to this funeral song which I sing over you:</span>
  </div>

  <div class="Stanza_Break"></div>

The full RegEx i need to do is <div class="ms3">(\\n.*?)+<\\/div> it picks up the first section and nothing else 我需要做的完整RegEx是<div class="ms3">(\\n.*?)+<\\/div>它拾取了第一部分,没有其他内容

Your problem starts with using File.foreach('input.txt') which cuts the result into lines. 您的问题始于使用File.foreach('input.txt') ,它将结果分成几行。 This means that the pattern is matched to each line separately, so none of the lines match the pattern (by definition, none of the lines have \\n in its middle). 这意味着该模式分别与每行匹配,因此没有一行与该模式匹配(根据定义,没有一行中间有\\n )。

You should have better luck reading the whole text as a block, and using match on it: 您应该更好地将整个文本作为一个块阅读并在其上使用match

File.read('input.txt').match(/<div class="ms3">(\n.*?)+<\/div>/)
# => #<MatchData "<div class=\"ms3\">\n    <span xml:lang=\"zxx\">
# => <span xml:lang=\"zxx\">Still the tone of the remainder of the chapter is bleak. The</span> 
# => <span class=\"See_In_Glossary\" xml:lang=\"zxx\">DAY OF THE 
# => <span class=\"Name_Of_God\" xml:lang=\"zxx\">LORD</span></span> 
# => <span xml:lang=\"zxx\">holds no hope for deliverance (5.16–18); 
# => the futility of offering sacrifices unmatched by common justice is once more 
# => underlined, and exile seems certain (5.21–27).</span></span>\n  </div>" 1:"\n  ">

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM