簡體   English   中英

在 python 中讀取 xml 文件時遇到問題

[英]having problem about reading an xml file in python

我想以這種方式在 Google Colab 上使用 python 讀取這個 xml 文件:

import xml.etree.ElementTree as ET

tree = ET.parse('drive/MyDrive/pubmed22n1192.xml')

pubmed22n1192.xml是這個文件的名字

但我得到這個錯誤信息

File "<string>", line unknown
ParseError: syntax error: line 1, column 0

這個文件有問題嗎? 考慮到這個文件的大小,我分享其中的幾行

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2019//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd">
<PubmedArticleSet>
  <PubmedArticle>
    <MedlineCitation Status="MEDLINE" Owner="NLM">
      <PMID Version="1">14584002</PMID>
      <DateCompleted>
        <Year>2004</Year>
        <Month>05</Month>
        <Day>04</Day>
      </DateCompleted>
      <DateRevised>
        <Year>2022</Year>
        <Month>02</Month>
        <Day>13</Day>
      </DateRevised>
      <Article PubModel="Print">
        <Journal>
          <ISSN IssnType="Electronic">1469-493X</ISSN>
          <JournalIssue CitedMedium="Internet">
            <Issue>4</Issue>
            <PubDate>
              <Year>2003</Year>
            </PubDate>
          </JournalIssue>
          <Title>The Cochrane database of systematic reviews</Title>
          <ISOAbbreviation>Cochrane Database Syst Rev</ISOAbbreviation>
        </Journal>
        <ArticleTitle>Intravenous immunoglobulin for the treatment of Kawasaki disease in children.</ArticleTitle>
        <Pagination>
          <MedlinePgn>CD004000</MedlinePgn>
        </Pagination>
        <Abstract>
          <AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Kawasaki disease is the most common cause of acquired heart disease in children in developed countries. The coronary arteries supplying the heart can be damaged in Kawasaki disease. The principal advantage of timely diagnosis is the potential to prevent this complication with early treatment. Intravenous immunoglobulin (IVIG) is widely used for this purpose.</AbstractText>
          <AbstractText Label="OBJECTIVES" NlmCategory="OBJECTIVE">The objective of this review was to evaluate the effectiveness of IVIG in treating, and preventing cardiac consequences, of Kawasaki disease in children.</AbstractText>
          <AbstractText Label="SEARCH STRATEGY" NlmCategory="METHODS">Electronic searches of the Cochrane Peripheral Vascular Disease Group Specialised Register, CENTRAL, MEDLINE, EMBASE, and CINAHL were performed (last searched April 2003). We also searched references from relevant articles and contacted authors where necessary. In addition we contacted experts in the field for unpublished works.</AbstractText>
          <AbstractText Label="SELECTION CRITERIA" NlmCategory="METHODS">Randomised controlled trials of intravenous immunoglobulin to treat Kawasaki disease were eligible for inclusion.</AbstractText>
          <AbstractText Label="DATA COLLECTION AND ANALYSIS" NlmCategory="METHODS">Fifty-nine trials were identified in the initial search. On careful inspection only sixteen of these met all the inclusion criteria. Trials were data extracted and assessed for quality by at least two reviewers. Data were combined for meta-analysis using relative risk ratios for dichotomous data or weighted mean difference for continuous data. A random effects statistical model was used.</AbstractText>
          <AbstractText Label="MAIN RESULTS" NlmCategory="RESULTS">The meta-analysis of IVIG versus placebo, including all children, showed a significant decrease in new coronary artery abnormalities (CAAs) in favour of IVIG, at thirty days RR (95% CI) = 0.74 (0.61 to 0.90). No statistically significant difference was found thereafter. A subgroup analysis excluding children with CAAs at enrollment also found a significant reduction of new CAAs in children receiving IVIG RR (95%) = 0.67 (0.46 to 1.00). There was a trend towards benefit from IVIG at sixty days (p=0.06). Results of dose comparisons showed a decrease in the number of new CAAs with increased dose. The meta-analysis of 400 mg/kg/day for five days versus 2 gm/kg in a single dose showed statistically significant reduction in CAAs at thirty days RR (95%) = 4.47 (1.55 to 12.86). This comparison also showed a significant reduction in duration of fever with the higher dose. There was no statistically significant difference noted between different preparations of IVIG. There was no statistically significant difference of adverse effects in any group.</AbstractText>
          <AbstractText Label="REVIEWER'S CONCLUSIONS" NlmCategory="CONCLUSIONS">Children fulfilling the diagnostic criteria for Kawasaki disease should be treated with IVIG (2 gm/kg single dose) within 10 days of onset of symptoms.</AbstractText>
        </Abstract>
        <AuthorList CompleteYN="Y">
          <Author ValidYN="Y">
            <LastName>Oates-Whitehead</LastName>
            <ForeName>R M</ForeName>
            <Initials>RM</Initials>
            <AffiliationInfo>
              <Affiliation>Research Division, Royal College of Paediatrics, 50 Hallam Street, London, UK, W1W 6DE.</Affiliation>
            </AffiliationInfo>
          </Author>
          <Author ValidYN="Y">
            <LastName>Baumer</LastName>
            <ForeName>J H</ForeName>
            <Initials>JH</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Haines</LastName>
            <ForeName>L</ForeName>
            <Initials>L</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Love</LastName>
            <ForeName>S</ForeName>
            <Initials>S</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Maconochie</LastName>
            <ForeName>I K</ForeName>
            <Initials>IK</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Gupta</LastName>
            <ForeName>A</ForeName>
            <Initials>A</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Roman</LastName>
            <ForeName>K</ForeName>
            <Initials>K</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Dua</LastName>
            <ForeName>J S</ForeName>
            <Initials>JS</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Flynn</LastName>
            <ForeName>I</ForeName>
            <Initials>I</Initials>
          </Author>
        </AuthorList>
        <Language>eng</Language>
        <PublicationTypeList>
          <PublicationType UI="D016428">Journal Article</PublicationType>
          <PublicationType UI="D017418">Meta-Analysis</PublicationType>
          <PublicationType UI="D016454">Review</PublicationType>
          <PublicationType UI="D000078182">Systematic Review</PublicationType>
        </PublicationTypeList>
      </Article>
      <MedlineJournalInfo>
        <Country>England</Country>
        <MedlineTA>Cochrane Database Syst Rev</MedlineTA>
        <NlmUniqueID>100909747</NlmUniqueID>
        <ISSNLinking>1361-6137</ISSNLinking>
      </MedlineJournalInfo>
      <ChemicalList>
        <Chemical>
          <RegistryNumber>0</RegistryNumber>
          <NameOfSubstance UI="D016756">Immunoglobulins, Intravenous</NameOfSubstance>
        </Chemical>
      </ChemicalList>
      <CitationSubset>IM</CitationSubset>
      <MeshHeadingList>
        <MeshHeading>
          <DescriptorName UI="D002648" MajorTopicYN="N">Child</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D016756" MajorTopicYN="N">Immunoglobulins, Intravenous</DescriptorName>
          <QualifierName UI="Q000627" MajorTopicYN="Y">therapeutic use</QualifierName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D009080" MajorTopicYN="N">Mucocutaneous Lymph Node Syndrome</DescriptorName>
          <QualifierName UI="Q000628" MajorTopicYN="Y">therapy</QualifierName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D016032" MajorTopicYN="N">Randomized Controlled Trials as Topic</DescriptorName>
        </MeshHeading>
      </MeshHeadingList>
      <NumberOfReferences>90</NumberOfReferences>
    </MedlineCitation>
    <PubmedData>
      <History>
        <PubMedPubDate PubStatus="pubmed">
          <Year>2003</Year>
          <Month>10</Month>
          <Day>30</Day>
          <Hour>5</Hour>
          <Minute>0</Minute>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="medline">
          <Year>2004</Year>
          <Month>5</Month>
          <Day>5</Day>
          <Hour>5</Hour>
          <Minute>0</Minute>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="entrez">
          <Year>2003</Year>
          <Month>10</Month>
          <Day>30</Day>
          <Hour>5</Hour>
          <Minute>0</Minute>
        </PubMedPubDate>
      </History>
      <PublicationStatus>ppublish</PublicationStatus>
      <ArticleIdList>
        <ArticleId IdType="pubmed">14584002</ArticleId>
        <ArticleId IdType="doi">10.1002/14651858.CD004000</ArticleId>
        <ArticleId IdType="pmc">PMC6544780</ArticleId>
      </ArticleIdList>
    </PubmedData>
  </PubmedArticle>

該文件包含一些文章的信息,這是第一篇,所以沒有包含好,我在 VScode 上使用 xml 擴展來查找一些格式錯誤,但似乎還可以

沒有完整的文件很難說,但是通過使用 xml 解析此代碼段,我收到了xml.etree.ElementTree.ParseError: no element found錯誤,這讓我認為 xml 可能格式不正確。

在這種情況下,您可以使用Beautiful Soup ,因為它對壞 xml 更有彈性,而且實際上在使用它時它似乎返回了預期的結果。

import bs4

xml = ...

soup = bs4.BeautifulSoup(xml, features="xml")
funny_chemical = soup.find("NameOfSubstance").text

print(funny_chemical)

退貨:

'Immunoglobulins, Intravenous'

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM