简体   繁体   中英

Inquiry on Node.js HTTP requests/responses and URL as primary key for this JSON-formatted document

I have a script that uses Node.js to request headers from a specific site.

var http = require("http");
var fs = require("fs");

var hostNames = ['www.google.com'];

var options = {
            host: hostNames[i],
            path: '/'
    };

http.get(options, function(res) {

        var obj = {};
        obj.statusCode = res.statusCode;
        obj.headers = res.headers;

        console.log(JSON.stringify(obj, null, 4));
    })

The output, for the URL "www.google.com" would be attached below:

{
    "statusCode": 200,
    "headers": {
        "date": "Mon, 04 Mar 2013 16:43:39 GMT",
        "expires": "-1",
        "cache-control": "private, max-age=0",
        "content-type": "text/html; charset=ISO-8859-1",
        "set-cookie": [
            "PREF=ID=cfa31a2cae817ca6:FF=0:TM=1362415419:LM=1362415419:S=m-sNTevwPhFFWVpv; expires=Wed, 04-Mar-2015 16:43:39 GMT; path=/; domain=.google.com",
            "NID=67=AKMqJ9Q94GtcmF0kTOAOLgFLqz9XAnSwVe4jzzXFVhvxuxRJP_l9QEwbjR3F7d506thF9BURyGJUz5DuNTEzXesit50Dm7FlOoVuL2qGRt9XZwRMGjAlxL5heO4vIATp; expires=Tue, 03-Sep-2013 16:43:39 GMT; path=/; domain=.google.com; HttpOnly"
        ],
        "p3p": "CP=\"This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info.\"",
        "server": "gws",
        "x-xss-protection": "1; mode=block",
        "x-frame-options": "SAMEORIGIN",
        "transfer-encoding": "chunked"
    }
}

My question is in-regards to JSON. I am trying to store the output into MongoDB. MongoDB stores JSON-like documents. From my understanding, SQL-based databases have a primary key. This is where my confusion comes in. I would like to use the URL, in this case, 'www.google.com' as the primary key. How do I achieve this? This is my first time using JSON-like storing structures, and the multiple articles I have read do not really apply to my specific situation.

When I search for "www.google.com" in the database, the plan is to have the headers show up, under "www.google.com." I don't know - I think I am still thinking in the SQL mindset. Can someone share some insight to this?

Here is official docs on object ids.

So you can create your own object id for a record using anything with appropriate format (hex number) and length, so this will work

db.names.insert({"_id": new ObjectId("012345678901234567890123"), "name" : "my name" })

but this dont

db.names.insert({"_id": new ObjectId("my reallllly long string"), "name" : "my name" })

you will need to use hash of your url if you want to using at object id.

However mongo gives you another option. leave _id field alone and create url field for url, and than set index on url field

db.scrapedPages.ensureIndex({ 'url': 1})

UPDATE: more specifically to your example. You are not going to set/change _id property, mongo does it for you. Instead you are going set url property of document to save, and reasonable thing to use here is your options object, as it defines the page you are parsing.

So I think you'll endup with something like that ( I expect you use mongo native driver and have mongo connection open )

var options = {
        host: hostNames[i],
        path: '/'
};

http.get(options, function(res) {
    var obj = {
       url: options.host + options.path // or whatever else is 
       statusCode : res.statusCode,
       headers : res.headers
    }
    save(obj, function(err, objects) {
       if (err) console.warn(err.message);
    })
})

function save(doc, callback) {
    var collection = new mongodb.Collection(client, 'test_collection')
     , cb = callback || function() {}
    collection.insert(doc, {safe:true}, cb);
}

The primary key in an SQL table is a column that is used to uniquely identify a particular row. In mongodb _id is the field which is the primary key. mongodb adds it automatically if you don't specify it and assigns an ObjectId (12 byte long BSON identifier) to it. You can check the details here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM