如何在Ruby中从字符串中拆分值

Question

我的示例字符串在这里列出。 我想拆分数组或哈希中的每个值结果，以处理每个元素的值。

<div id="test">
           accno:          123232323 <br>
           id:            5443534534534 <br>
           name:            test_name <br>
           url:                  www.google.com <br>

 </div>

我如何获取哈希或数组中的每个值。

Answer 1

使用正则表达式很容易：

s = '<div id="test">
           accno:          123232323 <br>
           id:            5443534534534 <br>
           name:            test_name <br>
           url:                  www.google.com <br>

 </div>'

 p s.scan(/\s+(.*?)\:\s+(.*?)<br>/).map.with_object({}) { |i, h| h[i[0].to_sym] = i[1].strip }

或者，如果它们仅包含小写字母，则可以像([az]+)一样精确化您的键（accno，id，名称，URL）：

 p s.scan(/\s+([a-z]+)\:\s+(.*?)<br>/).map.with_object({}) { |i, h| h[i[0].to_sym] = i[1].strip }

结果：

 {:accno=>"123232323", :id=>"5443534534534", :name=>"test_name", :url=>"www.google.com"}

更新

的情况下：

<div id="test"> accno: 123232323 id: 5443534534534 name: test_name url: www.google.com </div>

正则表达式将是：

 /([a-z]+)\:\s*(.*?)\s+/

([az]+) -这是哈希键，它可以包含-或_ ，然后将其添加为： ([az]+\\-_) 。 该方案假定key之后是:可能带有空格），然后是一些文本，直到空格为止。 或(\\s+|<)如果行尾没有空格，则在末尾： url: www.google.com</div>

Answer 2

如果要处理html，请使用像nokogiri这样的html / xml解析器，使用CSS选择器提取所需<div>标记的文本内容。 然后将文本解析为字段。

要安装nokogiri：

gem install nokogiri

然后处理页面和文本：

require "nokogiri"
require "open-uri"

# re matches: spaces (word) colon spaces (anything) space
re_fields  = /\s+(?<field>\w+):\s+(?<data>.*?)\s/

# Somewhere to store the results
record = {}

page      = Nokogiri::HTML( open("http://example.com/divtest.html") )

# Select the text from <div id=test> and scan into fields with the regex 
page.css( "div#test" ).text.scan( re_fields ){ |field, data|
    record[ field ] = data
}
p record

结果是：

{"accno"=>"123232323", "id"=>"5443534534534", "name"=>"test_name", "url"=>"www.google.com"}

如果您正在处理多个元素，也可以使用.each循环访问page.css( "blah" )选择器作为数组。

# Somewhere to store the results
records    = []

# Select the text from <div id=test> and scan into fields with the regex 
page.css( "div#test" ).each{ |div| 
    record = {}
    div.text.scan( re_fields ){ |field, data|
        record[field] = data
    }
    records.push record
}
p records

如何在Ruby中从字符串中拆分值

问题描述

2 个解决方案

解决方案1
4 已采纳 2014-05-08 08:08:23

解决方案2
1 2014-05-08 09:06:54

如何在Ruby中从字符串中拆分值

问题描述

2 个解决方案

解决方案1 4 已采纳 2014-05-08 08:08:23

解决方案2 1 2014-05-08 09:06:54

解决方案1
4 已采纳 2014-05-08 08:08:23

解决方案2
1 2014-05-08 09:06:54