简体   繁体   English

在数据库中存储 Perl hash 数据

[英]store Perl hash data in a database

I have written Perl code that parses a text file and uses a hash to tally up the number of times a US State abbreviation appears in each file/record.我已经编写了 Perl 代码来解析文本文件并使用 hash 来计算美国 State 缩写出现在每个文件/记录中的次数。 I end up with something like this.我最终得到这样的东西。

File: 521
OH => 4
PA => 1
IN => 2
TX => 3
IL => 7

I am struggling to find a way to store such hash results in an SQL database.我正在努力寻找一种方法来将此类 hash 结果存储在 SQL 数据库中。 I am using mariadb .我正在使用mariadb Because the structure of the data itself varies, one file will have some states and the next may have others.因为数据本身的结构不同,一个文件会有一些状态,而下一个文件可能有其他状态。 For example, one file may contain only a few states, the next may contain a group of completely different states.例如,一个文件可能只包含几个状态,下一个文件可能包含一组完全不同的状态。 I am even having trouble conceptualizing the table structure.我什至无法概念化表结构。 What would be the best way to store data like this in a database?将这样的数据存储在数据库中的最佳方法是什么?

There are many possible ways to store the data.有许多可能的方法来存储数据。

For sake of simplicity see if the following approach will be an acceptable solution for your case.为简单起见,请查看以下方法是否适合您的案例。 The solution is base on use one table with two indexes based upon id and state columns.该解决方案基于使用一个具有两个索引的表,该索引基于idstate列。

CREATE TABLE IF NOT EXISTS `state_count` (
    `id`        INT NOT NULL,
    `state`     VARCHAR(2) NOT NULL,
    `count`     INT NOT NULL,
    INDEX `id` (`id`),
    INDEX `state` (`state`)
);

INSERT INTO `state_count`
    (`id`,`state`,`count`)
VALUES
    ('251','OH',4),
    ('251','PA',1),
    ('251','IN',2),
    ('251','TX',3),
    ('251','IL',7);

Sample SQL SELECT output样品 SQL SELECT output

MySQL [dbs0897329] > SELECT * FROM state_count;
+-----+-------+-------+
| id  | state | count |
+-----+-------+-------+
| 251 | OH    |     4 |
| 251 | PA    |     1 |
| 251 | IN    |     2 |
| 251 | TX    |     3 |
| 251 | IL    |     7 |
+-----+-------+-------+
5 rows in set (0.000 sec)
MySQL [dbs0897329]> SELECT * FROM state_count WHERE state='OH';
+-----+-------+-------+
| id  | state | count |
+-----+-------+-------+
| 251 | OH    |     4 |
+-----+-------+-------+
1 row in set (0.000 sec)
MySQL [dbs0897329]> SELECT * FROM state_count WHERE state IN ('OH','TX');
+-----+-------+-------+
| id  | state | count |
+-----+-------+-------+
| 251 | OH    |     4 |
| 251 | TX    |     3 |
+-----+-------+-------+
2 rows in set (0.001 sec)

It's a little unclear in what direction your question goes.您的问题的方向有点不清楚。 But if you want a good relational model to store the data into, that would be three tables.但是如果你想要一个好的关系型 model 来存储数据,那就是三个表。 One for the files.一个用于文件。 One for the states.一份给各州。 One for the count of the states in a file.一个用于计数文件中的状态。 For example:例如:

The tables:表格:

CREATE TABLE file
             (id integer
                 AUTO_INCREMENT,
              path varchar(256)
                   NOT NULL,
              PRIMARY KEY (id),
              UNIQUE (path));

CREATE TABLE state
             (id integer
                 AUTO_INCREMENT,
              abbreviation varchar(2)
                           NOT NULL,
              PRIMARY KEY (id),
              UNIQUE (abbreviation));

CREATE TABLE occurrences
             (file integer,
              state integer,
              count integer
                    NOT NULL,
              PRIMARY KEY (file,
                           state),
              FOREIGN KEY (file)
                          REFERENCES file
                                     (id),
              FOREIGN KEY (state)
                          REFERENCES state
                                     (id),
              CHECK (count >= 0));

The data:数据:

INSERT INTO files
            (path)
            VALUES ('521');

INSERT INTO states
            (abbreviation)
            VALUES ('OH'),
                   ('PA'),
                   ('IN'),
                   ('TX'),
                   ('IL');

INSERT INTO occurrences
            (file,
             state,
             count)
            VALUES (1,
                    1,
                    4),
                   (1,
                    2,
                    1),
                   (1,
                    3,
                    2),
                   (1,
                    4,
                    3),
                   (1,
                    4,
                    7);

The states of course would be reused.这些状态当然会被重用。 Fill the table with all 50 and use them.用所有 50 个填满表格并使用它们。 They should not be inserted for every file again.不应再次为每个文件插入它们。

You can fill occurrences explicitly with a count of 0 for file where the respective state didn't appear, if you want to distinguish between "I know it's 0."如果您想区分“我知道它是 0”,您可以为没有occurrences相应 state 的文件显式填充count0 and "I don't know the count.", which would then be encoded through the absence of a corresponding row.和“我不知道计数。”,然后将通过缺少相应的行对其进行编码。 If you don't want to distinguish that and no row means a count of 0 you can handle that in queries by using outer joins and coalesce() to "translate" to 0 .如果您不想区分这一点并且没有行意味着计数为0 ,您可以在查询中通过使用外部连接和coalesce()来“翻译”为0来处理它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM