如何将多个哈希合并到一个有效的JSON文件中？

Question

I'm using the following code to generate a JSON file containing all category information for a particular website. 我正在使用以下代码生成一个JSON文件，其中包含特定网站的所有类别信息。

require 'mechanize'

@categories_hash = {}
@categories_hash['category'] ||= {}
@categories_hash['category']['id'] ||= {}
@categories_hash['category']['name'] ||= {}
@categories_hash['category']['group'] ||= {}
@categories_hash['category']['search_attributes'] ||= {}

# Initialize Mechanize object
a = Mechanize.new

# Open file and begin
File.open("json/booyah/#{Time.now.strftime '%Y%m%d%H%M%S'}_booyah_categories.json", 'w') do |f|
  puts '# Writing category data to JSON file'

  # Begin scraping
  a.get('http://www.marktplaats.nl/') do |page|
    groups = page.search('//*[(@id = "navigation-categories")]//a')
    groups.each_with_index do |group, index_1|

      a.get(group[:href]) do |page_2|
        categories = page_2.search('//*[(@id = "category-browser")]//a')
        categories.each_with_index do |category, index_2|

          a.get(category[:href]) do |page_3|
            search_attributes = page_3.search('//*[contains(concat( " ", @class, " " ), concat( " ", "heading", " " ))]')

            search_attributes.each_with_index do |attribute, index_3|
              @categories_hash['category']['id'] = "#{index_1}_#{index_2}"
              @categories_hash['category']['name'] = category.text
              @categories_hash['category']['group'] = group.text
              @categories_hash['category']['search_attributes'][index_3] = attribute.text unless attribute.text == 'Outlet '
            end

            # Uncomment if you want to see what's being written
            puts @categories_hash['category'].to_json

            # Write the converted Hash to the JSON file
            f.write(@categories_hash['category'].to_json)
          end
        end
      end
    end
  end

  puts '|-----------> Done.'

end

puts '# Finished.'

This code produces the following, invalid JSON file. 此代码生成以下无效的 JSON文件。 Take a look at the full JSON file here . 在这里查看完整的JSON文件。 It looks like this: 看起来像这样：

{
  "id": "0_0",
  "name": "Boeken en Bijbels",
  "group": "Antiek en Kunst",
  "search_attributes": {
    "0": "Prijs van/tot",
    "1": "Groep en Rubriek",
    "2": "Aangeboden sinds"
  }
}{
  "id": "0_1",
  "name": "Emaille",
  "group": "Antiek en Kunst",
  "search_attributes": {
    "0": "Prijs van/tot",
    "1": "Groep en Rubriek",
    "2": "Aangeboden sinds"
  }
}{
  "id": "0_2",
  "name": "Gereedschap en Instrumenten",
  "group": "Antiek en Kunst",
  "search_attributes": {
    "0": "Prijs van/tot",
    "1": "Groep en Rubriek",
    "2": "Aangeboden sinds"
  }
}{...}

I want the output to be valid JSON and look like this: 我希望输出为有效的JSON，如下所示：

[
  {
    "id": "0_0",
    "name": "Boeken en Bijbels",
    "group": "Antiek en Kunst",
    "search_attributes": {
      "0": "Prijs van/tot",
      "1": "Groep en Rubriek",
      "2": "Aangeboden sinds"
    }
  },
  {
    "id": "0_1",
    "name": "Emaille",
    "group": "Antiek en Kunst",
    "search_attributes": {
      "0": "Prijs van/tot",
      "1": "Groep en Rubriek",
      "2": "Aangeboden sinds"
    }
  },
  {
    "id": "0_2",
    "name": "Gereedschap en Instrumenten",
    "group": "Antiek en Kunst",
    "search_attributes": {
      "0": "Prijs van/tot",
      "1": "Groep en Rubriek",
      "2": "Aangeboden sinds"
    }
  },
  {...}
]

The question is, how do I accomplish this? 问题是，我该如何完成？

Update 更新资料

A big thank you to maerics for his answer . 非常感谢maerics的回答。

Here's the slightly updated, but working code: 这是稍作更新但有效的代码：

require 'mechanize'

@categories_hash = {}
@categories_hash['category'] ||= {}
@categories_hash['category']['id'] ||= {}
@categories_hash['category']['name'] ||= {}
@categories_hash['category']['group'] ||= {}
@categories_hash['category']['search_attributes'] ||= {}

@hashes = []

# Initialize Mechanize object
a = Mechanize.new

# Begin scraping
a.get('http://www.marktplaats.nl/') do |page|
  groups = page.search('//*[(@id = "navigation-categories")]//a')
  groups.each_with_index do |group, index_1|

    a.get(group[:href]) do |page_2|
      categories = page_2.search('//*[(@id = "category-browser")]//a')
      categories.each_with_index do |category, index_2|

        a.get(category[:href]) do |page_3|
          search_attributes = page_3.search('//*[contains(concat( " ", @class, " " ), concat( " ", "heading", " " ))]')

          search_attributes.each_with_index do |attribute, index_3|
            item = {
              id: "#{index_1}_#{index_2}",
              name: category.text,
              group: group.text,
              :search_attributes => {
                :index_3.to_s => "#{attribute.text unless attribute.text == 'Outlet '}"
              }
            }

            @hashes << item

            puts item

          end
        end
      end
    end
  end
end

# Open file and begin
File.open("json/light/#{Time.now.strftime '%Y%m%d%H%M%S'}_light_categories.json", 'w') do |f|
  puts '# Writing category data to JSON file'
  f.write(@hashes.to_json)
  puts '|-----------> Done.'
end

puts '# Finished.'

Answer 1

Using the builtin Ruby JSON library : 使用内置的Ruby JSON库：

require 'json'
hashes = []
all_hashes.each { |h| hashes << h }
print hashes.to_json

Or, in the extreme case that your hashes will not fit into the available memory (pseudocode): 或者，在极端情况下，您的散列将不适合可用的内存（伪代码）：

print '['
for each JSON hash H
  print H
  print ',' unless H is the last of the set
print ']'

如何将多个哈希合并到一个有效的JSON文件中？

问题描述

Update 更新资料

1 个解决方案

解决方案1
2 已采纳 2014-06-18 21:39:25

如何将多个哈希合并到一个有效的JSON文件中？

问题描述

Update 更新资料

1 个解决方案

解决方案1 2 已采纳 2014-06-18 21:39:25

解决方案1
2 已采纳 2014-06-18 21:39:25