简体   繁体   English

通过记录级别编辑用适当的数据库替换平面文件db

[英]replacing flat-file db with proper database with record level editing

I cannot install SQLite on a remote machine, so I have to find a way to store a large amount of data in some kind of database structure. 我无法在远程计算机上安装SQLite,因此我必须找到一种在某种数据库结构中存储大量数据的方法。

Example data 示例数据

key,values...
key,values....
..

There are currently about a million rows in a 20MB flat file, and hourly I have to read through each record and value in the file and update or add a record. 目前,一个20MB的平面文件中大约有一百万行,而且我每小时必须阅读一次文件中的每个记录和值,并更新或添加一条记录。 Since it is a flat file I have to rewrite the whole file each time. 由于它是一个平面文件,因此每次都必须重写整个文件。

I am looking at the Storable module, but I think it also writes data sequentially. 我正在查看Storable模块,但我认为它也按顺序写入数据。 I want to edit only those records which need to be changed. 我只想编辑那些需要更改的记录。

reading and updating of random records is a requirement. 必须读取和更新随机记录。 Additions can be anywhere(order is not important) 加法可以在任何地方(顺序不重要)

Can anyone suggest something? 有人可以建议吗? How will I know if I can setup a native Berkeley database file on these systems, which are a mixture of Solaris and Linux? 我如何知道是否可以在这些Solaris和Linux混合的系统上设置本地Berkeley数据库文件?

________________finally__________________ ________________最后__________________

finally I understood things better (thank you all), and based on your suggestions I used AnyDBM_File. 最终,我对事情有了更好的了解(谢谢大家),并且根据您的建议,我使用了AnyDBM_File。 It found NDBM_File ('C' library) installed on all OS. 发现所有操作系统上都安装了NDBM_File(“ C”库)。 So far so good. 到现在为止还挺好。

Just to check how it will play out in real world. 只是为了检查它如何在现实世界中发挥作用。 I ran a sample script to add 1 million records (the max records i think i may ever get in a day, normally between 500k to 700k). 我运行了一个示例脚本来添加1 million records (我认为我一天可能会获得的最大记录,通常在500k到700k之间)。 OMG it created a 110G data file on my disk !!!! 天哪,它在我的磁盘上创建了110G数据文件 and all the records were like: 所有记录都像:

a628234 = 0.178532683639599

I mean my real world records are longer than that. 我的意思是我的真实世界记录比那更长。 compare this to a flat file which is holding real-life 700k+ records and is only 15Mb on disk. 将此文件与保存了700k +实际记录且磁盘上只有15Mb的平面文件进行比较。

I am disappointed with the slowness and bloat-ness of this, so for now i think i will pay the price by writing the whole file each time an edit is required. 对它缓慢性和膨胀感到失望,所以现在我认为我将在每次需要编辑时通过写入整个文件来付出代价。

Thanks again for all your help. 再次感谢你的帮助。

As they said in the comments you may use SDBM_File module. 正如他们在评论中所说,您可以使用SDBM_File模块。 For example: 例如:

#!/usr/bin/perl 
use strict;
use warnings;
use v5.14;

use Fcntl;
use SDBM_File;

my $filename = "dbdb";

my %h;

tie %h, 'SDBM_File', $filename, O_RDWR|O_CREAT, 0666
    or die "Error: $!\n";

# To run only one time to fill the dbdb file.
# Next time you may delete this line and
# the output will be the same "16,40".    
$h{$_} = $_ * 2 . "," . $_ * 5  for 1..100;

say $h{8};

untie %h;

Output: 16,40 输出:16,40

Depends, what your program logic needs, but one solution is to partition database, based on keys. 取决于您的程序逻辑需要什么,但是一种解决方案是基于键对数据库进行分区。 So you can deal with many smaller files instead of one big file. 因此,您可以处理许多小文件,而不是一个大文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM