简体   繁体   English

如何使用python提取文本文件中的特定段落并将其保存在csv文件中?

[英]How to extract specific paragraph in a text file and save it in csv file using python?

I have a text file which contains the information about Title, Author, Abstract, DOI etc. I want to extract only the abstract and store it in a dataframe. I tried using below code, but I'm getting Author information and DOI, I only want the middle paragraph between Author information: and DOI: .我有一个包含标题、作者、摘要、DOI 等信息的文本文件。我只想提取摘要并将其存储在 dataframe 中。我尝试使用下面的代码,但我得到了作者信息和 DOI,我只想要Author information:DOI:之间的中间段落。 How do I get that specific paragraph and store it in a dataframe如何获取该特定段落并将其存储在 dataframe 中

extracted_lines=[]
extract = False

for line in open("abstract.txt"):

    if extract == False and "Author information:" in line.strip():
        extract = True
        
    if extract:
        extracted_lines.append(line)
        if "DOI:" in line.strip():
            extract = False
            
print("".join(extracted_lines))


**Output**

Author information:
(1)Carol Davila University of Medicine and Pharmacy, 37, Dionisie Lupu St, 
Bucharest, Romania 020021.
(2)National Institute of Public Health, 1-3 Doctor Leonte Anastasievici St, 
Bucharest, Romania 050463.

Dark chocolate is not the most popular chocolate; the higher concentration in 
antioxidants pays tribute to the increment in bitterness. The caloric density of 
dark chocolate is potentially lower but has a large variability according to 
recipes and ingredients. Nevertheless, in the last decade, the interest in dark 
chocolate as a potential functional food has constantly increased. In this 
review, we present the nutritional composition, factors influencing the 
bioavailability, and health outcomes of dark chocolate intake. We have extracted 
pro- and counter-arguments to illustrate these effects from both experimental 
and clinical studies in an attempt to solve the dilemma. The antioxidative and 
anti-inflammatory abilities, the cardiovascular and metabolic effects, and 
influences on central neural functions were selected to substantiate the main 
positive consequences. Beside the caloric density, we have included reports 
placing responsibility on chocolate as a migraine trigger or as an inducer of 
the gastroesophagial reflux in the negative effects section. Despite an 
extensive literature review, there are not large enough studies specifically 
dedicated to dark chocolate that took into consideration possible confounders on 
the health-related effects. Therefore, a definite answer on our initial question 
is, currently, not available.

DOI: 10.5740/jaoacint.19-0132
Author information:
(1)School of Food Science and Nutrition, Faculty of Maths and Physical Sciences, 
University of Leeds, Leeds LS2 9JT, UK.
(2)School of Food Science and Nutrition, Faculty of Maths and Physical Sciences, 
University of Leeds, Leeds LS2 9JT, UK. Electronic address: 
g.williamson@leeds.ac.uk.

Dark chocolate contains many biologically active components, such as catechins, 
procyanidins and theobromine from cocoa, together with added sucrose and lipids. 
All of these can directly or indirectly affect the cardiovascular system by 
multiple mechanisms. Intervention studies on healthy and 
metabolically-dysfunctional volunteers have suggested that cocoa improves blood 
pressure, platelet aggregation and endothelial function. The effect of chocolate 
is more convoluted since the sucrose and lipid may transiently and negatively 
impact on endothelial function, partly through insulin signalling and nitric 
oxide bioavailability. However, few studies have attempted to dissect out the 
role of the individual components and have not explored their possible 
interactions. For intervention studies, the situation is complex since suitable 
placebos are often not available, and some benefits may only be observed in 
individuals showing mild metabolic dysfunction. For chocolate, the effects of 
some of the components, such as sugar and epicatechin on FMD, may oppose each 
other, or alternatively in some cases may act together, such as theobromine and 
epicatechin. Although clearly cocoa provides some cardiovascular benefits 
according to many human intervention studies, the exact components, their 
interactions and molecular mechanisms are still under debate.

Copyright © 2015 Elsevier Inc. All rights reserved.

DOI: 10.1016/j.vph.2015.05.011

Expected Output

Index    Abstract
    0      Dark chocolate is not the most popular chocola...
    1      Dark chocolate contains many biologically acti...

You can try:你可以试试:

  • retrieving the whole content of the file as a string以字符串形式检索文件的全部内容
  • splitting on 'Author information:\n', to retrieve infos about every single paper拆分“作者信息:\n”,以检索有关每篇论文的信息
  • getting the index 1 of your papers, to retrieve the abstracts获取论文的索引 1,以检索摘要

Here's the code:这是代码:

with open("abstract.txt") as f:
    contents = f.read()

papers = [p for p in contents.split('Author information:\n')]
abstracts = [p.split("\n\n")[1] for p in papers[1:]

Does it work for you?对你起作用吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 使用正则表达式提取文本文件中的段落 - Python extract paragraph in text file using regex 如何使用html文件中的lxml在python中提取段落文本? - How to extract paragraph text in python using lxml from html file? 如何使用 python 读取文本文件并将特定单词保存到 csv 或另一个文本文件中 - How to read text file and save specific words into csv or another text file using python 如何使用 python 从文本文件中提取特定数据并写入 CSV - How to extract specific data from a text file and write into CSV using python 使用python从文本文件中提取段落并排除目录和标题 - Using python to extract the paragraph from text file and to exclude catalog and title 如何使用 python 从文本文件中提取特定文本段落? - How to extract specific text paragraphs from a Text file using python? 如何从 json 文件中提取特定数据并将其保存为 python 文件中的 csv 文件 - How to extract specific data from json file and save it as csv file in python 从文本文件中提取特定记录并保存到 Python 中的新文件 - Extract specific records from a text file and save to a new file in Python 如何使用 python 或 JavaScript 提取文本并保存为 excel 文件 - How to extract text and save as excel file using python or JavaScript 如何使用python将波斯语文本保存在csv文件中? - How to save Farsi text in csv file using python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM