简体   繁体   English

查询包含JSON对象的larg文本文件

[英]querying larg text file containing JSON objects

I have few Gigabytes text file in format: {"user_ip":"xxxx", "action_type":"xxx", "action_data":{"some_key":"some_value"...},...} 我有几千GB格式的文本文件:{“user_ip”:“xxxx”,“action_type”:“xxx”,“action_data”:{“some_key”:“some_value”...},...}

each entry is one line. 每个条目都是一行。

First I would like to easily find entries for given ip. 首先,我想轻松找到给定ip的条目。 This part is easy because I can use grep for example. 这部分很简单,因为我可以使用grep例如。 However even for this I would like to find better solution because I would like to get response as fast as possible. 然而即便如此,我希望找到更好的解决方案,因为我希望尽快得到答复。

Next part is more complicated because I would like to find entries from selected ip and of selected type and with particular value of some_key in action_data. 下一部分更复杂,因为我想从选定的ip和所选类型中找到条目,并在action_data中找到some_key的特定值。

Probably I would have to convert this file to SQL db (probably SQLite, because it will be desktop APP), but I would ask if there are exists better solutions? 可能我必须将此文件转换为SQL db(可能是SQLite,因为它将是桌面APP),但我会问是否存在更好的解决方案?

Yes, put it into a database, any database. 是的,将它放入数据库,任何数据库。 Then querying it will be straightforward. 然后查询它将是直截了当的。

You could take a look at MongoDB , a document based database. 你可以看一下基于文档的数据库MongoDB With it you essentially store JSON objects that you can then index and easily query in an efficient way. 有了它,你实际上存储了JSON对象,然后你可以索引并轻松地以有效的方式查询。 You can find about how to query in the docs: Querying . 您可以在文档中找到有关如何查询的信息: 查询

Just wanted to mention that Oracle Berkeley DB 11gR2 (released on April 1st, 2010) introduces support for a SQL API. 只是想提一下,Oracle Berkeley DB 11gR2(2010年4月1日发布)引入了对SQL API的支持。 In fact, the SQL API is the sqlite3() API. 实际上,SQL API sqlite3()API。 So, as Jason mentioned, if you'd like the ease-of-use of SQLite, combined with the scalability and concurrency of Berkeley DB, you can now get both things in a single library. 因此,正如Jason所说,如果你喜欢SQLite的易用性,再加上Berkeley DB的可扩展性和并发性,你现在可以将它们放在一个库中。

Regards, 问候,

Dave 戴夫

If you need the relational guarantees of an SQL-based DB, definitely go ahead with SQLite . 如果您需要基于SQL的数据库的关系保证,请务必继续使用SQLite It will allow for fast queries, joins, aggregations, sorts, and overall any sort of search you could possibly dream up. 它可以实现快速查询,连接,聚合,排序以及您可能想到的任何类型的搜索。 It sounds like this is just a big list of Actions performed by users at some IP, so you'll probably want to use some sort of sequence as your primary key since none of the other attributes look like good candidates. 听起来这只是用户在某些IP上执行的操作的大列表,因此您可能希望使用某种序列作为主键,因为其他任何属性都不是好的候选者。

On the other hand, if you just need to do very simple queries, eg look up entries by IP, look up entries by action type, etc., you might want to look into Oracle Berkeley DB . 另一方面,如果您只需要进行非常简单的查询,例如按IP查找条目,按操作类型查找条目等,您可能需要查看Oracle Berkeley DB As long as you don't need any searches that are too fancy, Berkeley DB will let you store Terabytes of data and access them at record speed. 只要您不需要任何过于花哨的搜索,Berkeley DB就会让您存储太字节数据并以创纪录的速度访问它们。

So look over both and see what's best for your use case. 所以,看看两者,看看什么是最适合您的用例。 They're good for different things, which might be why both are available as storage systems on Android, for instance. 它们适用于不同的东西,这可能就是为什么两者都可以作为Android上的存储系统使用的原因。 I think SQLite will probably win out, but when thinking about embedded local DB systems you should always at least consider both of these technologies. 我认为SQLite可能会胜出,但在考虑嵌入式本地数据库系统时,您应该始终至少考虑这两种技术。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM