简体   繁体   中英

store Perl hash data in a database

I have written Perl code that parses a text file and uses a hash to tally up the number of times a US State abbreviation appears in each file/record. I end up with something like this.

File: 521
OH => 4
PA => 1
IN => 2
TX => 3
IL => 7

I am struggling to find a way to store such hash results in an SQL database. I am using mariadb . Because the structure of the data itself varies, one file will have some states and the next may have others. For example, one file may contain only a few states, the next may contain a group of completely different states. I am even having trouble conceptualizing the table structure. What would be the best way to store data like this in a database?

There are many possible ways to store the data.

For sake of simplicity see if the following approach will be an acceptable solution for your case. The solution is base on use one table with two indexes based upon id and state columns.

CREATE TABLE IF NOT EXISTS `state_count` (
    `id`        INT NOT NULL,
    `state`     VARCHAR(2) NOT NULL,
    `count`     INT NOT NULL,
    INDEX `id` (`id`),
    INDEX `state` (`state`)
);

INSERT INTO `state_count`
    (`id`,`state`,`count`)
VALUES
    ('251','OH',4),
    ('251','PA',1),
    ('251','IN',2),
    ('251','TX',3),
    ('251','IL',7);

Sample SQL SELECT output

MySQL [dbs0897329] > SELECT * FROM state_count;
+-----+-------+-------+
| id  | state | count |
+-----+-------+-------+
| 251 | OH    |     4 |
| 251 | PA    |     1 |
| 251 | IN    |     2 |
| 251 | TX    |     3 |
| 251 | IL    |     7 |
+-----+-------+-------+
5 rows in set (0.000 sec)
MySQL [dbs0897329]> SELECT * FROM state_count WHERE state='OH';
+-----+-------+-------+
| id  | state | count |
+-----+-------+-------+
| 251 | OH    |     4 |
+-----+-------+-------+
1 row in set (0.000 sec)
MySQL [dbs0897329]> SELECT * FROM state_count WHERE state IN ('OH','TX');
+-----+-------+-------+
| id  | state | count |
+-----+-------+-------+
| 251 | OH    |     4 |
| 251 | TX    |     3 |
+-----+-------+-------+
2 rows in set (0.001 sec)

It's a little unclear in what direction your question goes. But if you want a good relational model to store the data into, that would be three tables. One for the files. One for the states. One for the count of the states in a file. For example:

The tables:

CREATE TABLE file
             (id integer
                 AUTO_INCREMENT,
              path varchar(256)
                   NOT NULL,
              PRIMARY KEY (id),
              UNIQUE (path));

CREATE TABLE state
             (id integer
                 AUTO_INCREMENT,
              abbreviation varchar(2)
                           NOT NULL,
              PRIMARY KEY (id),
              UNIQUE (abbreviation));

CREATE TABLE occurrences
             (file integer,
              state integer,
              count integer
                    NOT NULL,
              PRIMARY KEY (file,
                           state),
              FOREIGN KEY (file)
                          REFERENCES file
                                     (id),
              FOREIGN KEY (state)
                          REFERENCES state
                                     (id),
              CHECK (count >= 0));

The data:

INSERT INTO files
            (path)
            VALUES ('521');

INSERT INTO states
            (abbreviation)
            VALUES ('OH'),
                   ('PA'),
                   ('IN'),
                   ('TX'),
                   ('IL');

INSERT INTO occurrences
            (file,
             state,
             count)
            VALUES (1,
                    1,
                    4),
                   (1,
                    2,
                    1),
                   (1,
                    3,
                    2),
                   (1,
                    4,
                    3),
                   (1,
                    4,
                    7);

The states of course would be reused. Fill the table with all 50 and use them. They should not be inserted for every file again.

You can fill occurrences explicitly with a count of 0 for file where the respective state didn't appear, if you want to distinguish between "I know it's 0." and "I don't know the count.", which would then be encoded through the absence of a corresponding row. If you don't want to distinguish that and no row means a count of 0 you can handle that in queries by using outer joins and coalesce() to "translate" to 0 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM