简体   繁体   English

Python:用于存储和检索历史数据的本地数据库

[英]Python: Local Database for storing and retrieving historical data

I'm trying to create some weather models and I want to store and retrieve data on my hard drive.我正在尝试创建一些天气模型,并且我想在硬盘驱动器上存储和检索数据。

Data is in this format:数据格式如下:

{'Date_Time':'2020-07-18 18:16:17','Temp':29.0, 'Humidity':45.3}
{'Date_Time':'2020-07-18 18:18:17','Temp':28.9, 'Humidity':45.4}
{'Date_Time':'2020-07-18 18:20:17 ','Temp':28.8, 'Humidity':48.3}

I have new data coming in every day, I have old data from ~5 years ago.我每天都有新数据,我有大约 5 年前的旧数据。 I would like to periodically merge the data sets and create one large data set to manipulate.我想定期合并数据集并创建一个大型数据集进行操作。

Things I need:我需要的东西:

1. Check if the date-time pair already exists, else add new data
2. Change old data values
3. Add new data values to the database
4. Must be on a local storage, I have plenty of space.

Things I would like but do not need:我想要但不需要的东西:

1. Fastest Read access possible, not so concerned about storage time as that happens in the background mostly.
2. Something that makes searching for all data from today, last 7 days etc easy to retrieve

Things I have tried:我尝试过的事情:

  1. Appending to a json file附加到 json 文件

    Works for now but is slow because I have to load the entire data set every time I want to append/modify现在可以工作,但速度很慢,因为每次我想追加/修改时都必须加载整个数据集

  2. Appending to a text file附加到文本文件

    Easy to store, but hard to modify/check values易于存储,但难以修改/检查值

  3. SQLLite3 SQLLite3

    I looked into this and it seemed workable, just wanted to know if there was something better before I just go ahead and do this.我研究了这个,它似乎可行,只是想知道在我只是 go 之前是否有更好的东西并执行此操作。

Thank you for your help!谢谢您的帮助!

Not sure whether it's "better" but json_database seems to do what you're looking for:不确定它是否“更好”,但json_database似乎可以满足您的需求:

  • save and load from file从文件保存和加载
  • search recursively by key and key/value pairs通过键和键/值对递归搜索
  • fuzzy search模糊搜索
  • supports arbitrary objects支持任意对象

The selection of JSON vs TXT vs SQL or NoSQL DB would be based on your current and future requirements. JSON vs TXT vs SQL 或 NoSQL DB 的选择将基于您当前和未来的要求。

  1. From your inputs, you have data for last 5 years and the data from the example is for every 2 seconds.根据您的输入,您有过去 5 年的数据,示例中的数据是每 2 秒的数据。 Based on this, it seems like you will have a large dataset or will need to prune the dataset frequently.基于此,您似乎将拥有一个大型数据集或需要经常修剪数据集。 For large datasets, using a SQL or NoSQL DB would be ideal so that you do not load all data to memory for every read/write operation.对于大型数据集,使用 SQL 或 NoSQL DB 将是理想的,这样您就不会在每次读/写操作时将所有数据加载到 memory。
  2. Using the date-time as your primary key, you would be able to read-write pretty quickly using a database.使用日期时间作为主键,您将能够使用数据库非常快速地进行读写。
  3. Using SQLLite is a good start but if your data is going to grow, you should plan to move to an external SQL/NoSQL database.使用 SQLLite 是一个好的开始,但如果您的数据要增长,您应该计划迁移到外部 SQL/NoSQL 数据库。
  4. Seeing that your data is mostly time based, it would be good to evaluate Time Series database like InfluxDB or Graphite.看到您的数据主要是基于时间的,最好评估像 InfluxDB 或 Graphite 这样的时间序列数据库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM