简体   繁体   English

存储栅格数据的好方法是什么?

[英]What's a good way to store raster data?

I have a variety of time-series data stored on a more-or-less georeferenced grid, eg one value per 0.2 degrees of latitude and longitude.我在或多或少的地理参考网格上存储了各种时间序列数据,例如每 0.2 度纬度和经度一个值。 Currently the data are stored in text files, so at day-of-year 251 you might see:目前,数据存储在文本文件中,因此在一年中的第 251 天,您可能会看到:

251
 12.76 12.55 12.55 12.34 [etc., 200 more values...]
 13.02 12.95 12.70 12.40 [etc., 200 more values...]
 [etc., 250 more lines]
252
 [etc., etc.]

I'd like to raise the level of abstraction, improve performance, and reduce fragility (for example, the current code can't insert a day between two existing ones!).我想提高抽象级别,提高性能并减少脆弱性(例如,当前代码不能在两个现有代码之间插入一天!)。 We'd messed around with BLOB-y RDBMS hacks and even replicating each line of the text file format as a row in a table (one row per timestamp/latitude pair, one column per longitude increment -- yecch!).我们搞砸了 BLOB-y RDBMS 黑客,甚至将文本文件格式的每一行复制为表中的一行(每个时间戳/纬度对一行,每个经度增量一列 - 是的!)。

We could go to a "real" geodatabase, but the overhead of tagging each individual value with a lat and long seems prohibitive.我们可以去一个“真正的”地理数据库,但是用纬度和经度标记每个单独的值的开销似乎令人望而却步。 The size and resolution of the data haven't changed in ten years and are unlikely to do so.数据的大小和分辨率在十年内没有变化,而且不太可能发生变化。

I've been noodling around with putting everything in NetCDF files, but think we need to get past the file mindset entirely -- I hate that all my software has to figure out filenames from dates, deal with multiple files for multiple years, etc.. The alternative, putting all ten years' (and counting) data into a single file, doesn't seem workable either.我一直在考虑将所有内容都放在 NetCDF 文件中,但我认为我们需要完全摆脱文件思维模式——我讨厌我的所有软件都必须从日期中找出文件名,处理多个文件多年,等等。 . 将所有十年(和计数)数据放在一个文件中的替代方法似乎也不可行。

Any bright ideas or products?有什么好的想法或产品吗?

I've assembled your comments here:我已经在这里收集了您的评论:

  1. I'd like to do all this "w/o writing my own file I/O code"我想完成所有这些“无需编写自己的文件 I/O 代码”
  2. I need access from "Java Ruby MATLAB" and "FORTRAN routines"我需要从“Java Ruby MATLAB”和“FORTRAN 例程”访问

When you add these up, you definitely don't want a new file format.当你把这些加起来时,你肯定不想要一种新的文件格式。 Stick with the one you've got.坚持你所拥有的。

If we can get you to relax your first requirement - ie, if you'd be willing to write your own file I/O code, then there are some interesting options for you.如果我们能让您放宽您的第一个要求——即,如果您愿意编写自己的文件 I/O 代码,那么有一些有趣的选择适合您。 I'd write C++ classes, and I'd use something like SWIG to make your new classes available to the multiple languages you need.我会编写 C++ 类,并且会使用 SWIG 之类的东西使您的新类可用于您需要的多种语言。 (But I'm not sure you'd be able to use SWIG to give you access from Java, Ruby, MATLAB and FORTRAN. You might need something else. Not really sure how to do it, myself.) (但我不确定您是否能够使用 SWIG 来让您从 Java、Ruby、MATLAB 和 FORTRAN 访问。您可能需要其他东西。我自己不太确定该怎么做。)

You also said, "Actually, if I have to have files, I prefer text because then I can just go in and hand-edit when necessary."你还说,“实际上,如果我必须有文件,我更喜欢文本,因为这样我就可以在必要时进去手动编辑。”

My belief is that this is a misguided statement.我认为这是一个误导性的陈述。 If you'd be willing to make your own file I/O routines then there are very clever things you could do... And as an ultimate fallback, you could give yourself a tool that converts from the new file format to the same old text format you're used to... And another tool that converts back.如果您愿意制作自己的文件 I/O 例程,那么您可以做一些非常聪明的事情……作为最终的回退,您可以给自己一个工具,将新文件格式转换为相同的旧文件格式你习惯的文本格式......还有另一个转换回来的工具。 I'll come back to this at the end of my post...我会在我的帖子结束时回到这个......

You said something that I want to address:你说了一些我想解决的问题:

"leverage 40 yrs of DB optimization" “利用 40 年的数据库优化”

Databases are meant for relational data, not raster data.数据库用于关系数据,而不是栅格数据。 You will not leverage anyone's DB optimizations with this kind of data.对于此类数据,您不会利用任何人的数据库优化 You might be able to cram your data into a DB, but that's hardly the same thing.您也许可以将数据塞入数据库中,但这几乎不是一回事。

Here's the most useful thing I can tell you, based on everything you've told us.根据你告诉我们的一切,这是我能告诉你的最有用的事情。 You said this:你是这么说的:

"I am more interested in optimizing my time than the CPU's, though exec speed is good!" “与 CPU 相比,对优化时间更感兴趣,尽管执行速度很好!”

This is frankly going to require TOOLS.坦率地说,这需要工具。 Stop thinking of it as a text file.不要将其视为文本文件。 Start thinking of the common tasks you do, and write small tools - in WHATEVER LANGAUGE(S) - to make those things TRIVIAL to do.开始考虑您所做的常见任务,并编写小工具 - 使用任何语言 - 使这些事情变得微不足道。

And if your tools turn out to have lousy performance?如果您的工具性能不佳? Guess what - it's because your flat text file is a cruddy format.你猜怎么着 - 这是因为你的纯文本文件是一种粗糙的格式。 But that's just my opinion.但那只是我的个人意见。 :) :)

I'd definitely change from text to binary but keep each day in a separate file still.我肯定会从文本更改为二进制,但仍然每天都保存在一个单独的文件中。 You could name them in such a way that insertions in between don't cause any strangeness with indices, such as by including the date and possible time in the filename.您可以以这样一种方式命名它们,即在它们之间的插入不会对索引造成任何奇怪的影响,例如在文件名中包含日期和可能的时间。 You could also consider the file structure if you have several fields per location for example.例如,如果每个位置有多个字段,您也可以考虑文件结构。 Is it common to look for a small tile from a large number of timesteps?从大量时间步长中寻找小块是否常见? In that case you might want to store them as tiles containing data from several days.在这种情况下,您可能希望将它们存储为包含几天数据的图块。 You didn't mention how the data is accessed which plays a big role in how to organise it efficiently.您没有提到如何访问数据,这在如何有效组织数据方面发挥着重要作用。

Clarifications:说明:

I'm surprised you added "database" as one of the tags, and considered it as an option.我很惊讶您将“数据库”添加为标签之一,并将其​​视为一种选择。 Why did you do this?你为什么这样做?

Essentially, you have a 2D, single component floating point image at every time step.本质上,您在每个时间步都有一个 2D 单分量浮点图像。 Would you agree with this way of viewing your data?您同意这种查看数据的方式吗?

You also mentioned the desire to insert a day between two existing ones - which seems to be a very odd thing to do.您还提到希望在两个现有日期之间插入一天 - 这似乎是一件非常奇怪的事情。 Why would you need to do that?为什么你需要这样做? Is there a new day between May 4 and May 5 that I don't know about? 5 月 4 日和 5 月 5 日之间是否有我不知道的新一天?

Is "compression" one of the things you care about, or are you just sick of flat files? “压缩”是您关心的事情之一,还是您只是厌倦了平面文件?

Would a float or a double be sufficient to store your data, or do you feel you need more arbitrary precision?浮点数或双精度数是否足以存储您的数据,或者您是否需要更高的任意精度?

Also, what programming language(s) do you want to access this data with?另外,您想用什么编程语言访问这些数据?

your answer on how to store the data depends entirely on what you're going to do with the data.您对如何存储数据的回答完全取决于您将如何处理数据。 for example, if you only ever need to retrieve by specifying the date or a date range, then storing in a database as a BLOB makes some sense.例如,如果您只需要通过指定日期或日期范围进行检索,那么将 BLOB 存储在数据库中是有意义的。 but if you need to find records that have certain values, you'll need to do something different.但如果您需要查找具有特定值的记录,则需要做一些不同的事情。

please describe how you need to be able to access the data/请描述您需要如何访问数据/

Matt, thanks very much, and likewise longneck and jirv. Matt,非常感谢,longneck 和 jirv 也是如此。

This post was partly an experiment, testing the quality of stackoverflow discourse.这篇文章部分是一个实验,测试 stackoverflow 话语的质量。 If you guys/gals/alien lifeforms are representative, I'm sold.如果你们/女孩/外星生命形式有代表性,我被卖了。

And on point, you've clarified my thinking considerably.在这一点上,你已经大大阐明了我的想法。 Mind, I still might not necessarily implement your advice, but know that I will be thinking about it very seriously.请注意,我仍然可能不一定会执行您的建议,但我知道我会非常认真地考虑它。 >;-) >;-)

I may very well leave the file format the same, add to the extant C and/or Ruby routines to tack on the few low-level features I lack (eg inserting missing timesteps), and hang an HTTP front end on the whole thing so that the data can be consumed by whatever box needs it, in whatever language is currently hoopy.我很可能保持文件格式不变,添加到现有的 C 和/或 Ruby 例程中以添加我缺乏的少数低级功能(例如插入缺失的时间步长),并在整个过程中挂起一个 HTTP 前端,以便数据可以被任何需要它的盒子使用,无论是当前流行的语言。 While it's mostly unchanging legacy software that construct these data, we're always coming up with new consumers for it, so the multi-language/multi-computer requirement (gee, did I forget that one?) applies to the reading side, not the writing side.虽然构建这些数据的主要是不变的遗留软件,但我们总是为它想出新的消费者,所以多语言/多计算机要求(哎呀,我忘了那个吗?)适用于阅读方面,而不是写作方面。 That also obviates a whole slew of security issues.这也避免了一系列安全问题。

Thanks again, folks.再次感谢各位。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 没有关系存储数据的最佳方法是什么? - What's the best way to store data with no relation? 使用PHP / MySQL封装数据访问的好方法是什么? - What's a good way to encapsulate data access with PHP/MySQL? 在SQLite数据库中存储订单的好方法是什么 - What is a good way to store order in an SQLite database 在全局常量中加载和存储内容以在Django中进行缓存的好方法是什么? - What's a good way to load and store stuff in global constants for caching in Django? 什么是存储此关系的好方法,以便我可以有效地回答这种形式的查询? - What's a good way to store this relation so I can answer queries of this form efficiently? 在离线单客户端应用程序中存储大量数据的好方法是什么? - What is a good way to store a large amount of data in an offline, single-client application? 在数据库中存储分类数据的最佳方法是什么? (例如Facebook相册) - What would be a good way to store categorized data on a database? (ex. Facebook albums) 在mysql中存储数据的最佳/正确方法是什么? - What's the best/correct way to store data in mysql? 什么是在BlackBerry上存储(外部)和读取数据的推荐方法? - What's the recommended way to store (externally) and read data on a BlackBerry? 在 MySQL 中构建和存储部分数据的最佳方法是什么? - What's the best way to structure and store part of data in MySQL?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM