![](/img/trans.png)
[英]Query PubMed with Python - How to get all article details from query to Pandas DataFrame and export them in CSV
[英]when I try to get all article details from query on PubMed to Pandas DataFrame and export them all into CSV
from pymed import PubMed
pubmed = PubMed(tool="PubMedSearcher", email="daspranab239@gmail.com")
search_term = "Your search term"
results = pubmed.query(search_term, max_results=500)
articleList = []
articleInfo = []
for article in results:
打印我們找到的 object 的類型(可以是 PubMedBookArticle 或 PubMedArticle)。 我們需要使用可用的 function 將其轉換為字典
articleDict = article.toDict()
articleList.append(articleDict)
生成 dict 記錄列表,其中包含可以從 PUBMED API 獲取的所有文章詳細信息
for article in articleList:
#Sometimes article['pubmed_id'] 包含用逗號分隔的列表 - 在該列表中取第一個 pubmedId - 那是文章 pubmedId
pubmedId = article['pubmed_id'].partition('\n')[0]
Append 文章信息到字典
articleInfo.append({u'pubmed_id':pubmedId,
u'title':article['title'],
u'keywords':article['keywords'],
u'journal':article['journal'],
u'abstract':article['abstract'],
u'conclusions':article['conclusions'],
u'methods':article['methods'],
u'results': article['results'],
u'copyrights':article['copyrights'],
u'doi':article['doi'],
u'publication_date':article['publication_date'],
u'authors':article['authors']})
從字典列表生成 Pandas DataFrame
articlesPD = pd.DataFrame.from_dict(articleInfo)
articlesPD
當我嘗試執行上面的代碼時,我得到了 KeyError: 'keywords', 'journal', 'conclusions', .. etc。
根據以下代碼, article
是dict
以外的instance
,因此字段應由 訪問.
除了get
或括號[]
參考https://github.com/gijswobben/pymed/blob/master/pymed/article.py#L124
class PubMedArticle(object):
def _initializeFromXML(self: object, xml_element: TypeVar("Element")) -> None:
""" Helper method that parses an XML element into an article object.
"""
# Parse the different fields of the article
self.pubmed_id = self._extractPubMedId(xml_element)
self.title = self._extractTitle(xml_element)
self.keywords = self._extractKeywords(xml_element)
self.journal = self._extractJournal(xml_element)
self.abstract = self._extractAbstract(xml_element)
self.conclusions = self._extractConclusions(xml_element)
self.methods = self._extractMethods(xml_element)
self.results = self._extractResults(xml_element)
self.copyrights = self._extractCopyrights(xml_element)
self.doi = self._extractDoi(xml_element)
self.publication_date = self._extractPublicationDate(xml_element)
self.authors = self._extractAuthors(xml_element)
self.xml = xml_element
以下內容可能會有所幫助。
from pymed import PubMed
import json
pubmed = PubMed(tool="PubMedSearcher", email="daspranab239@gmail.com")
search_term = "Your search term"
results = pubmed.query(search_term, max_results=500)
articleList = []
articleInfo = []
def get_data(article, name):
return getattr(article, 'name', 'N/A')
for article in results:
pubmedId = article.pubmed_id.partition('\n')[0]
articleInfo.append({
u'pubmed_id': pubmedId,
u'title': article.title,
u'keywords': get_data(article, 'keywords'),
u'journal': get_data(article, 'journal'),
u'abstract': article.abstract,
u'conclusions': get_data(article, 'conclusions'),
u'methods': get_data(article, 'methods'),
u'results': get_data(article, 'results'),
u'copyrights': article.copyrights,
u'doi': article.doi,
u'publication_date': article.publication_date,
u'authors': article.authors})
print(json.dumps(articleInfo, indent=4, default=str))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.