简体   繁体   English

PostgreSQL,Epidoc和Dublin核心-数据库和xml

[英]Postgresql, Epidoc, and dublin core - databases and xml

I'm about to start a project which records textual information using Epidoc xml. 我将开始一个使用Epidoc xml记录文本信息的项目。 Here is an example: http://www.stoa.org/epidoc/gl/latest/supp-structure.html I'm wanting to store the data in Postgresql. 这是一个示例: http : //www.stoa.org/epidoc/gl/latest/supp-structure.html我想将数据存储在Postgresql中。 I understand xml and I understand the basics of postgresql. 我了解xml,也了解Postgresql的基础知识。 What is the correct/best way to put these two things together? 将这两件事放在一起的正确/最佳方法是什么?

For example, so I can use sql to select * from db where xmltag = value 例如,因此我可以使用sql从db中选择*,其中xmltag = value

Very short and simplified mini-primer 非常简短的迷你引物

Create your tables, they will look something like 创建表,它们看起来像

CREATE TABLE xml_table 
(
    document_id integer /* you'd normally use serial */ PRIMARY KEY,
    xml_data xml
) ;

Check PostgreSQL documentation about the XML data type . 检查有关XML数据类型的 PostgreSQL文档。

You will fill your tables with queries like the following ones: 您将使用以下查询填充表:

/* If you use XML as content, you'd insert it this way */
INSERT INTO
    xml_table (document_id, xml_data)
VALUES
        (1, xmlparse(content '<doc><title>Doc title</title></doc>')),

    (2, xmlparse(content '<doc>
          <preface>This is the preface</preface>
             <chapter><title>Hello</title><content>This is a content</content></chapter>
         <chapter><title>Good Bye</title><content>This is the end</content></chapter>
     </doc>')),
    (3, xmlparse(content '<doc>
         <preface>Yet a preface</preface>
             <chapter><title>C1</title><content>Content of C1</content></chapter>
         <chapter><title>C2</title><content>Content of C2</content></chapter>
     </doc>')) ;

I am not using EpiDocs as examples at this point for the sake of conciseness, but the concept is the same. 为了简洁起见,我现在不以EpiDocs为例,但是概念是相同的。

Note that, normally, you do not want to store your whole database as a single XML document (that would be inefficient for most DBs), but as several documents, identified by a number (or whatever is more convenient to use to identify them) 请注意,通常,您不希望将整个数据库存储为单个XML文档(对于大多数DB而言效率不高),而是存储为多个由数字标识的文档(或使用更方便的方式来标识它们)。

If you insert whole documents (and EpiDoc seems to require this approach): 如果插入整个文档(并且EpiDoc似乎需要这种方法):

/* If your XML are documents, this way */
INSERT INTO
    xml_table (document_id, xml_data)
VALUES
    (4, xmlparse(document '<?xml version="1.0"?><book><title>Oh my God</title><content>Short book</content></book>')) ;

Note that PostgreSQL will not check that your document complies with your DTD (this would require the database to query the outside world , which is normally out of the scope of databases). 请注意,PostgreSQL 不会检查您的文档是否符合DTD(这将要求数据库查询外界 ,这通常不在数据库范围之内)。 You have to check for conformance, if needed, in your software before inserting the values to the DB, if you want to ensure. 如果需要,您必须将值插入数据库之前检查软件中是否符合要求。

You'll retrieve whole document (or content) this way: 您将通过以下方式检索整个文档(或内容):

SELECT
   xml_data
FROM
   xml_table
WHERE
    document_id = 3 ;

Although you'll normally query using xpath and xpath_exists to get specific items. 尽管通常会使用xpathxpath_exists查询以获取特定项。 For instance, imagine you want to get the title of the last chapter of each book (having chapters). 例如,假设您想获得每本书最后一章的标题(包含章节)。 You'd use: 您将使用:

SELECT
    /* Get the text content of title of the last chapter of every doc */
    xpath('/doc/chapter[last()]/title/text()', xml_data) AS result
FROM
    xml_table 
WHERE
    /* Choose only the docs where they have (at least) a chapter with title */
    xpath_exists('/doc/chapter/title', xml_data) ;

Check PostgreSQL XML functions, and XPath Intro . 检查PostgreSQL XML函数和XPath Intro

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM