正则表达式解析使用Nokogiri

Question

Using Nokogiri, I need to parse a block given: 使用Nokogiri，我需要解析一个块：

<div class="some_class">
  12 AB / 4+ CD
  <br/>
  2,600 Dollars
  <br/> 
</div>

I need to get the ab , cd and dollars values if they exist. 如果存在，我需要获取ab ， cd和dollars值。

ab = p.css(".some_class").text[....some regex....]
cd = p.css(".some_class").text[....some regex....]
dollars = p.css(".some_class").text[....some regex....]

Is that correct? 那是对的吗？ If so, can someone help me with a regex to parse the ab , cd and dollars values? 如果是这样，有人可以用正则表达式来帮助我解析ab ， cd和dollars值吗？

Answer 1

To get a better answer you would have to clarify exactly what format the AB, CD and Dollar values take but here is a solution based on the example given. 为了获得更好的答案，您必须明确说明AB，CD和Dollar值的格式，但这是基于给出的示例的解决方案。 It uses a regexp grouping () to capture the information we're interested in. (see the bottom of the answer for more details) 它使用正则表达式分组()来捕获我们感兴趣的信息。（有关详细信息，请参阅答案的底部）

text = p.css(".some_class").text

# one or more digits followed by a space followed by AB, capture the digits
ab = text.match(/(\d+) AB/).captures[0] # => "12"

# one of more non digits followed by a literal + followed by CD
cd = text.match(/(\d+\+) CD/).captures[0] # => "4+"

# digits or commas followed by "Dollars"
dollars = text.match(/([\d,]+) Dollars/).captures[0] # => "2,600"

Note that if there is no match then String#match returns nil so if the values might not exist you would need a check eg 请注意，如果没有匹配，则String#match返回nil因此如果值可能不存在，则需要检查，例如

if match = text.match(/([\d,]+) Dollars/)
  dollars = match.captures[0]
end

Additional explanation of captures 捕获的其他说明

To match the amount of AB we need a pattern /\\d+ AB/ to identify the right part of the text. 为了匹配AB的数量，我们需要一个模式/\\d+ AB/来识别文本的正确部分。 However, we're really only interested in the numeric part so we surround that with brackets so that we can extract it. 但是，我们真的只对数字部分感兴趣所以我们用括号括起来，以便我们可以提取它。 eg 例如

irb(main):027:0> match = text.match(/(\d+) AB/)
=> #<MatchData:0x2ca3440>           # the match method returns MatchData if there is a match, nil if not
irb(main):028:0> match.to_s         # match.to_s gives us the entire text that matched the pattern
=> "12 AB"
irb(main):029:0> match.captures     
=> ["12"]
# match.captures gives us an array of the parts of the pattern that were enclosed in ()
# in our example there is just 1 but there could be multiple
irb(main):030:0> match.captures[0]
=> "12"                             # the first capture - the bit we want

Take a look at the documentation for MatchData , in particular the captures method for more details. 查看MatchData的文档，特别是捕获方法以获取更多详细信息。

Answer 2

This is an older thread, but I just stumbled upon it. 这是一个较老的线程，但我偶然发现它。 Here's how I'd locate the values and a usable way of storing the values: 以下是我如何定位值以及存储值的可用方法：

require "ap"
require "nokogiri"

xml = <<EOT
<div class="some_class">
  12 AB / 4+ CD
  <br/>
  2,600 Dollars
  <br/> 
</div>
EOT

doc = Nokogiri::XML(xml)

some_class = doc.at('.some_class').text

values = some_class
  .scan(/([\d+]+) ([a-z,]+)/i)
  .each_with_object({}){ |(v,c), h| h[c] = v.to_i }

values # => {"AB"=>12, "CD"=>4, "Dollars"=>600}

正则表达式解析使用Nokogiri

问题描述

2 个解决方案

解决方案1
6 已采纳 2010-07-17 18:17:34

解决方案2
0 2011-12-20 03:16:09

正则表达式解析使用Nokogiri

问题描述

2 个解决方案

解决方案1 6 已采纳 2010-07-17 18:17:34

解决方案2 0 2011-12-20 03:16:09

解决方案1
6 已采纳 2010-07-17 18:17:34

解决方案2
0 2011-12-20 03:16:09