简体   繁体   English

持久的动态属性和查询

[英]persisting dynamic properties and query

I have a requirement to implement contact database. 我需要实现联系人数据库。 This contact database is special in a way that user should be able to dynamically (on runtime) add properties he/she wants to track about the contact. 此联系人数据库的特殊之处在于,用户应该能够动态地(在运行时)添加他/她想跟踪联系人的属性。 Some of these properties are of type string, other numbers and dates. 其中一些属性的类型为字符串,其他数字和日期。 Some of the properties have pre-defined values, others are free fields etc.. User wants to be also able to query such structure fast and easily. 一些属性具有预定义的值,其他属性是自由字段等。用户希望也能够快速,轻松地查询这种结构。 The database needs to handle easily 500 000 contacts each having around 10 properties. 该数据库需要轻松处理500 000个具有大约10个属性的联系人。

It leads to dynamic property model having Contact class with dynamic properties. 它导致动态属性模型具有具有动态属性的Contact类。

class Contact{

private Map<DynamicProperty, Collection<DynamicValue> values> propertiesAndValues;

//other userfull methods

}

The question is how can I store such a structure in "some database" - it does not have to be RDBMS so that I can easily express queries such as 问题是如何将这种结构存储在“某些数据库”中-不必是RDBMS,这样我就可以轻松地表达查询,例如

Get all contacts whose name starts with Martin, they are from Company of size 5000 or less, order by time when this contact was inserted in a database, only first 100 results (provide pagination), where each of these segments correspond to a dynamic property. 获取所有名称以Martin开头的联系人,这些联系人来自大小不超过5000的Company,按此联系人插入数据库的时间顺序排列,仅前100个结果(提供分页),其中每个细分对应一个动态属性。

I need: 我需要:

  • filtering - equal, partial equal, (bigger, smaller for integers, dates) and maybe aggregation - but it is not necessary at this point 过滤-相等,部分相等((对于整数,日期更大,更小),并且可能是聚合),但此时没有必要
  • sorting 分类
  • pagination 分页

I was considering RDBMS, but this leads more less to this structure which is quite hard to query and it tends to be slow for this amount of data 我当时正在考虑使用RDBMS,但这更多地导致了这种结构的查询,这种结构很难查询,而且对于如此大量的数据它往往很慢

contact(id serial pk,....);

dynamic_property(dp_id serial pk, ...);

--only one of the values is not empty
dynamic_property_value(dpv_id serial pk, dynamic_property_fk int, value_integer int, date_value timestamp, text_value text);

contact_properties(pav_id serial pk, contact_id_fk int, dynamic_propert_fk int);

property_and_its_value(pav_id_fk int, dpv_id int);

I consider following options: 我考虑以下选项:

  • store contacts in RDBMS and use Lucene for querying - is there anything that would help with this? 将联系人存储在RDBMS中,并使用Lucene进行查询-有什么可以帮助您的吗?
  • Store dynamic properties as XML and store it to rdbms and use xpath support - unfortunatelly it seems to be pretty slow for 500000 contacts 将动态属性存储为XML,并将其存储到rdbms并使用xpath支持-不幸的是,对于500000个联系人而言,它似乎非常慢
  • use another database - Mango DB or Jackrabbit to store this information 使用另一个数据库-Mango DB或Jackrabbit来存储此信息

Which way would you go and why? 您会走哪条路,为什么?

Wikipedia has a great entry on Entity-Attribute-Value modeling which is a data modeling technique for representing entities with arbitrary properties. Wikipedia在“ 实体-属性-值”建模方面有很多条目,这是一种数据建模技术,用于表示具有任意属性的实体。 It's typically used for clinical data, but might apply to your situation as well. 它通常用于临床数据,但也可能适用于您的情况。

Have you considered using Lucene for your querying needs? 您是否考虑过使用Lucene满足查询需求? You could probably get away with just using Lucene and store all your data in the index. 您可能仅使用Lucene就能逃脱并将所有数据存储在索引中。 Although I wouldn't recommend using Lucene as your only persistence store. 尽管我不建议使用Lucene作为唯一的持久性存储。

Alternatively, you could use Lucene along with a RDBMS and take advantage of something like Compass . 另外,您可以将Lucene与RDBMS一起使用,并利用Compass之类的优势。

  1. You could try other kind of databases like CouchDB which is a document oriented db and is distributed 您可以尝试其他类型的数据库,例如CouchDB,它是面向文档的db,并且已分发
  2. If you want a dumb solution, for your contacts table you could add some 50 columns like STRING_COLUMN1, STRING_COLUMN2... upto 10, DATE_COLUMN1..DATE_COLUMN10. 如果您想要一个简单的解决方案,则可以为您的联系人表添加大约50列,例如STRING_COLUMN1,STRING_COLUMN2 ...最多10列,DATE_COLUMN1..DATE_COLUMN10。 You have another DESCRIPTION column. 您还有另一个DESCRIPTION列。 So if a row has a name which is a string then STRING_COLUMN1 stores the value of your name and the DESCRIPTION column value would be "STRING_COLUMN1-NAME". 因此,如果某行的名称为字符串,则STRING_COLUMN1将存储您的名称值,DESCRIPTION列的值为“ STRING_COLUMN1-NAME”。 In this case querying can be a bit tricky. 在这种情况下,查询可能会有些棘手。 I know many purists laugh at this, but I have seen a similar requirement solved this way in one of the apps :) 我知道许多纯粹主义者对此表示嘲笑,但是在其中一个应用程序中,我已经看到类似的要求可以通过这种方式解决:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM