簡體   English   中英

如何使用 rdflib(或純 sparql)訪問 rdf 列表的成員

[英]How to access members of an rdf list with rdflib (or plain sparql)

訪問 rdf 列表成員的最佳方法是什么? 我正在使用 rdflib (python),但在普通 SPARQL 中給出的答案也可以(這種類型的答案可以通過 rdfextras,一個 rdflib 幫助程序庫使用)。

我正在嘗試訪問 Zotero 生成的 rdf 中特定期刊文章的作者(為簡潔起見,某些字段已被刪除):

<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:z="http://www.zotero.org/namespaces/export#"
 xmlns:dcterms="http://purl.org/dc/terms/"
 xmlns:bib="http://purl.org/net/biblio#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
 xmlns:link="http://purl.org/rss/1.0/modules/link/">
    <bib:Article rdf:about="http://www.ncbi.nlm.nih.gov/pubmed/18273724">
        <z:itemType>journalArticle</z:itemType>
        <dcterms:isPartOf rdf:resource="urn:issn:0954-6634"/>
        <bib:authors>
            <rdf:Seq>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Lee</foaf:surname>
                        <foaf:givenname>Hyoun Seung</foaf:givenname>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Lee</foaf:surname>
                        <foaf:givenname>Jong Hee</foaf:givenname>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Ahn</foaf:surname>
                        <foaf:givenname>Gun Young</foaf:givenname>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Lee</foaf:surname>
                        <foaf:givenname>Dong Hun</foaf:givenname>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Shin</foaf:surname>
                        <foaf:givenname>Jung Won</foaf:givenname>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Kim</foaf:surname>
                        <foaf:givenname>Dong Hyun</foaf:givenname>
                    </foaf:Person>
                </rdf:li>
                <rdf:li>
                    <foaf:Person>
                        <foaf:surname>Chung</foaf:surname>
                        <foaf:givenname>Jin Ho</foaf:givenname>
                    </foaf:Person>
                </rdf:li>
            </rdf:Seq>
        </bib:authors>

        <dc:title>Fractional photothermolysis for the treatment of acne scars: a report of 27 Korean patients</dc:title>
        <dcterms:abstract>OBJECTIVES: Atrophic post-acne scarring remains a therapeutically challe *CUT*, erythema and edema. CONCLUSIONS: The 1550-nm erbium-doped FP is associated with significant patient-reported improvement in the appearance of acne scars, with minimal downtime.</dcterms:abstract>
        <bib:pages>45-49</bib:pages>
        <dc:date>2008</dc:date>
        <z:shortTitle>Fractional photothermolysis for the treatment of acne scars</z:shortTitle>
        <dc:identifier>
            <dcterms:URI>
               <rdf:value>http://www.ncbi.nlm.nih.gov/pubmed/18273724</rdf:value>
            </dcterms:URI>
        </dc:identifier>
        <dcterms:dateSubmitted>2010-12-06 11:36:52</dcterms:dateSubmitted>
        <z:libraryCatalog>NCBI PubMed</z:libraryCatalog>
        <dc:description>PMID: 18273724</dc:description>
    </bib:Article>
    <bib:Journal rdf:about="urn:issn:0954-6634">
        <dc:title>The Journal of Dermatological Treatment</dc:title>
        <prism:volume>19</prism:volume>
        <prism:number>1</prism:number>
        <dcterms:alternative>J Dermatolog Treat</dcterms:alternative>
        <dc:identifier>DOI 10.1080/09546630701691244</dc:identifier>
        <dc:identifier>ISSN 0954-6634</dc:identifier>
    </bib:Journal>

rdf容器一般都很痛苦,處理起來很煩人。 我發布了兩個解決方案,一個沒有SPARQL,另一個沒有SPARQL。 我個人更喜歡第二個,即使用SPARQL的那個。

示例1:沒有SPARQL

要獲得給定文章的所有作者,例如您的情況,您可以執行類似我在下面發布的代碼。

我添加了評論,以便自我解釋。 最重要的一點是使用g.triple(triple_pattern)和圖形函數,基本上你可以過濾rdflib圖並搜索你需要的三元模式。

當解析rdf:Seq時,然后謂詞形式:

http://www.w3.org/1999/02/22-rdf-syntax-ns#_1

http://www.w3.org/1999/02/22-rdf-syntax-ns#_2

http://www.w3.org/1999/02/22-rdf-syntax-ns#_3

在創建時,rdflib以隨機順序檢索它們,因此您需要對它們進行排序以按正確的順序遍歷它們。

import rdflib

RDF = rdflib.namespace.RDF

#Parse the file
g = rdflib.Graph()
g.parse("zot.rdf")

#So that we are sure we get something back
print "Number of triples",len(g)

#Couple of handy namespaces to use later
BIB = rdflib.Namespace("http://purl.org/net/biblio#")
FOAF = rdflib.Namespace("http://xmlns.com/foaf/0.1/")

#Author counter to print at the bottom
i=0

#Article for wich we want the list of authors
article = rdflib.term.URIRef("http://www.ncbi.nlm.nih.gov/pubmed/18273724")

#First loop filters is equivalent to "get all authors for article x" 
for triple in g.triples((article,BIB["authors"],None)):

    #This expresions removes the rdf:type predicate cause we only want the bnodes
    # of the form http://www.w3.org/1999/02/22-rdf-syntax-ns#_SEQ_NUMBER
    # where SEQ_NUMBER is the index of the element in the rdf:Seq
    list_triples = filter(lambda y: RDF['type'] != y[1], g.triples((triple[2],None,None)))

    #We sort the authors by the predicate of the triple - order in sequences do matter ;-)
    # so "http://www.w3.org/1999/02/22-rdf-syntax-ns#_435"[44:] returns 435
    # and since we want numberic order we do int(x[1][44:]) - (BTW x[1] is the predicate)
    authors_sorted =  sorted(list_triples,key=lambda x: int(x[1][44:]))

    #We iterate the authors bNodes and we get surname and givenname
    for author_bnode in authors_sorted:
        for x in g.triples((author_bnode[2],FOAF['surname'],None)):
            author_surname = x[2]
        for y in g.triples((author_bnode[2],FOAF['givenname'],None)):
            author_name = y[2]
        print "author(%s): %s %s"%(i,author_name,author_surname)
        i += 1

此示例顯示如何在不使用SPARQL的情況下執行此操作。

示例2:使用SPARQL

現在有完全相同的例子,但使用SPARQL。

rdflib.plugin.register('sparql', rdflib.query.Processor,
                       'rdfextras.sparql.processor', 'Processor')
rdflib.plugin.register('sparql', rdflib.query.Result,
                       'rdfextras.sparql.query', 'SPARQLQueryResult')

query = """
SELECT ?seq_index ?name ?surname WHERE {
     <http://www.ncbi.nlm.nih.gov/pubmed/18273724> bib:authors ?seq .
     ?seq ?seq_index ?seq_bnode .
     ?seq_bnode foaf:givenname ?name .
     ?seq_bnode foaf:surname ?surname .
}
"""
for row in sorted(g.query(query, initNs=dict(rdf=RDF,foaf=FOAF,bib=BIB)),
                                                  key=lambda x:int(x[0][44:])):
    print "Author(%s) %s %s"%(row[0][44:],row[1],row[2])

如圖所示,我們仍然需要進行排序,因為庫本身並不處理它。 在查詢中,變量seq_index包含謂詞,該謂詞包含有關序列順序的信息,並且是在lambda函數中進行排序的謂詞。

在較新版本的RDFLib可以以更簡化的方式訪問集合。 現在可以使用Seq類以編程方式訪問序列中的成員:

from rdflib import *
from rdflib.graph import Seq
from rdflib.namespace import FOAF
BIB = Namespace("http://purl.org/net/biblio#")

# Load data
g = Graph()
g.parse(file=open("./zotero.rdf", "r"), format="application/rdf+xml")

# Get the first resource linked to article via bib:authors
article = URIRef("http://www.ncbi.nlm.nih.gov/pubmed/18273724")
authors = g.objects(article, BIB.authors).__next__()
i = 1
for author in Seq(g, authors):
    givenname = g.triples((author, FOAF.givenname, None)).__next__()[2]
    surname = g.triples((author, FOAF.surname, None)).__next__()[2]
    print("%i: %s %s" % (i, str(givenname), str(surname)))
    i += 1

該查詢很古老,但為了完整起見:訪問 RDF Seq 或 List 的成員最好使用 SPARQL + 過濾器來解決。

SELECT ?container ?member
WHERE {
 ?container ?prop ?member.
 FILTER(?prop == rdfs:member || 
        regexp(str(?prop),
        "^http://www.w3.org/1999/02/22-rdf-syntax-ns#_[0-9]+$"))
}

這在很大程度上等同於manuel-salvadores的示例 2,但是您應該更好地將他的變量?seq_index (相當於我的?prop )限制為相關屬性。

正如您還提到的 RDF 列表,本例中的 SPARQL 1.1 查詢是

SELECT ?list ?member
WHERE {
 ?list rdf:rest*/rdf:first ?member.
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM