简体   繁体   English

按属性搜索的最佳数据库类型和架构

[英]Best database type and schema to search by attributes

I know that this question may not have an easy answer or at least many possible correct ones. 我知道这个问题可能没有一个简单的答案,或者至少有许多可能的正确答案。

I am developing a weather web app to search cities by summary, temperature, humidity, precipitation, wind speed, visibility, pressure and some other weather indicators. 我正在开发一个天气网络应用程序,以通过摘要,温度,湿度,降水,风速,能见度,压力和其他一些天气指标来搜索城市。 I will also include the weather station set up that, for making things easier, let's consider it is unique in every city. 我还将包括气象站的设置,为了使事情变得更容易,让我们考虑一下它在每个城市中都是独特的。 I would also like to include some city data such as: population, afforestation index as well as latitude, longitude. 我还要包括一些城市数据,例如:人口,绿化指数以及纬度,经度。

Continent, Country and Region will also be needed. 大陆,国家和地区也将是必需的。

Weather station will include the model number of every sensor installed in it. 气象站将包括其中安装的每个传感器的型号。

There will be around 5.000 cities. 大约有5.000个城市。

Most used query will be to search the cities by a temperature, humidity, precipitation, wind speed, visibility and pressure range as well as filtering by population, etc. and weather station sensor model name. 最常用的查询是通过温度,湿度,降水,风速,能见度和压力范围以及按人口等进行过滤以及气象站传感器型号来搜索城市。

A query would look like: 查询如下所示:

  • summary = “Clear” 摘要=“清除”

  • and temperature > 6 and temperature < 10 温度> 6且温度<10

  • and pressure > 900 and pressure <1000 压力> 900且压力<1000

  • and visibility > 5 and visibility < 7 和可见度> 5,可见度<7

  • and humidity > 0.60 and humidity < 0.90 和湿度> 0.60和湿度<0.90

  • and population is > 20.000 人口> 20.000

  • and afforestation index is > 3 绿化指数> 3

  • and country = France 和国家=法国

  • and “sensor1” = “string” 和“ sensor1” =“字符串”

The question is: What database type and schema fit the best my search needs regarding to performance? 问题是:哪种数据库类型和架构最适合我对性能的搜索需求? As you can see I need to search by attributes and not by the city name itself. 如您所见,我需要按属性而不是城市名称本身进行搜索。 I am completely free to use Relational or NoSQL database rather that I would like to use an asynchronous system. 我完全可以使用关系数据库或NoSQL数据库,而不必使用异步系统。

I don't know if a NoSQL db like MongoDB is intended to be used like this, if this is the case, would this schema be fast enough? 我不知道是否打算像这样使用像MongoDB这样的NoSQL数据库,如果是这种情况,这种模式是否足够快? I am worried as everything is nested and indexes can be huge. 我很担心,因为所有内容都是嵌套的,索引可能很大。

"continents": 
[
    {
        "name": "Europe",
        "countries": 
        [
            {
                "name": "France",
                "regions": 
                [
                    {
                        "name": "Île-de-France"
                        "cities": 
                        [
                            {
                                "name": "Paris",
                                "coordinates": {"lat": 48.856614, "lon": 2.352222},
                                "summary":"Clear",
                                "temperature": 9.4,
                                "pressure": 976,
                                "visibility" : 6.8,
                                "humidity" : 0.84,
                                "afforestation": 6,
                                "population": 2249975,
                                ...
                                "weather_station": {
                                    "name": "name",
                                    "sensor 1": "string",
                                    "sensor 2": "string",
                                    "sensor 3": "string",
                                    "sensor 4": "string",
                                }
                            },
                            ...
                        ]
                    },
                    ...
                ]                   
            },
            ...
        ]
    },
    ...
]

I guess this use case has been developed in many other apps that require a search by element attributes. 我猜想这种用例已经在许多其他需要按元素属性搜索的应用程序中开发。

Oh! 哦! I forgot to say that I am using Python and Tornado web framework. 我忘了说我正在使用Python和Tornado Web框架。

Many thanks for your help! 非常感谢您的帮助!

The Following Schema May be what you are Looking for. 以下架构可能是您要寻找的。

Note that in document DB's you will need to denormalize your data slightly to match the way its accessed the most 请注意,在文档数据库中,您需要稍微对数据进行规范化,以使其最常访问数据

this would be 1 row in a City Collection 这将是城市集合中的1行

{
    "City": "Paris",
    "coordinates": {"lat": 48.856614, "lon": 2.352222},
    "summary":"Clear",
    "temperature": 9.4,
    "pressure": 976,
    "visibility" : 6.8,
    "humidity" : 0.84,
    "afforestation": 6,
    "population": 2249975,
    ...
    "weather_station": {
        "name": "name",
        "sensor 1": "string",
        "sensor 2": "string",
        "sensor 3": "string",
        "sensor 4": "string",
    }
    "region": "Île-de-France",
    "country":"France",
    "continent":"Europe"
}

5000 rows in one table? 一张桌子中有5000行? About 20 metrics? 大约20个指标? No "history"? 没有“历史”?

Make a single table with 5000 rows and 20 columns. 制作具有5000行和20列的单个表。 No INDEXes other than the minimal PRIMARY KEY for UPDATEing a row when a weather station reports in. Build a SELECT from the desired conditions, then let to optimizer do a full table scan. 除了最小的PRIMARY KEY以外,没有其他索引可用于在气象站报告时更新行。根据所需条件构建SELECT,然后让优化器进行全表扫描。

Everything will stay in RAM, and the SELECTs will be "brute force". 一切都将保留在RAM中,而SELECT将成为“蛮力”。 It should take only a few milliseconds. 只需要几毫秒。 (I ran a similar SELECT on a 2.7M-row table; it took 1.3 seconds.) (我在2.7M行的表上运行了类似的SELECT;花了1.3秒。)

If you are keeping history, then we need to talk further. 如果您保留历史记录,那么我们需要进一步谈谈。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM