簡體   English   中英

python-正則表達式從文件中提取某些文本數據

[英]python - Regular expression to extract certain text data from a file

我有一個文本文件,它已從pdf轉換為文本數據。 我想從文本數據中提取出現的描述,后跟字符串“ FIGURE”。 以下是一些文本數據示例行,

圖1-1。 設計劑量方案的經驗方法。 在施用葯物的劑量方案后監測期望的和不利的作用,並通過反饋(虛線)進一步改善和優化方案。

Derendorf5e_CH01.indd 4Derendorf5e_CH01.indd 4 5/25/19 11:07 PM5 / 25/19 11:07 PM

第1章•治療的相關性5

觀察這兩個子學科的另一種方式是,葯代動力學處理人體對葯物的作用(吸收,分布,代謝,排泄),而葯效學描述葯物對人體的作用(既有預期作用又有不良作用)。 從這個定義中,可能會錯誤地得出結論,這些是相反的學科,而實際上,它們是相輔相成的。 圖1-3顯示,葯代動力學處理濃度-時間關系,而葯效學描述葯物濃度與良好(預期)效果和不良(不良)效果之間的關系。 這兩個拼圖片本身每個都不足以指導治療和優化劑量。 只有將葯代動力學和葯效動力學聯系起來(PK / PD)並整合在一起,它們才有治療作用。 通常通過開發數學模型(PK / PD模型)來實現這種集成,該數學模型可以捕獲觀察到的關系,並可以預測和確定最佳的給葯方案。

圖1-2。 設計劑量方案的合理方法。 首先定義了葯物的葯代動力學和葯代動力學。 然后,將對葯物的反應以及葯代動力學信息用作反饋(虛線)以修改劑量方案以實現最佳治療。 對於某些葯物,可能還需要考慮體內形成的活性代謝產物。

我已經將pdf文件讀入文本,並嘗試使用一些正則表達式組合對文本數據應用re.search。 但是沒有運氣。

# Get files text content
text = file_data['content']
#print(text)
text1 = re.search('FIGURE[ ]*[0-9]-[0-9]. (.*)',text,re.MULTILINE)
text1 = re.findall('FIGURE\s*[0-9]+-[0-9]+. (.*)',text,re.MULTILINE)
>>> import re
>>> t="""FIGURE 1-1. An empirical approach to the design of a dosage regimen. The effects, both desired and adverse, are monitored after the administration of a dosage regimen of a drug and used to further refine and optimize the regimen through feedback ( dashed line ).
...
... Derendorf5e_CH01.indd 4Derendorf5e_CH01.indd 4 5/25/19 11:07 PM5/25/19 11:07 PM
...
... CHAPTER 1 • Therapeutic Relevance 5
...
... Another way of looking at these two subdisciplines is that pharmacokinetics deals with what the body does to the drug (absorption, distribution, metabolism, excretion), whereas pharmacodynamics describes what the drug does to the body (both desired and undesired effects). From this definition, one could wrongly conclude that these are opposite disci- plines, whereas in reality, they go hand-in-hand. Figure 1-3 shows that pharmacokinetics deals with concentration–time relationships, whereas pharmacodynamics describes the relationship between drug concentration and both good (desired) and bad (adverse) effects. Each of these two puzzle pieces by itself is insufficient to guide therapy and optimize dosing; only when pharmacokinetics and pharmacodynamics are linked (PK/PD) and integrated do they become therapeutically useful. This integration is commonly achieved by developing mathematical models (PK/PD models) that capture the observed relationships and allow prediction and identification of optimum dosing regimens.
...
... FIGURE 1-2. A rational approach to the design of a dosage regimen. The pharmacokinetics and pharmacodynam- ics of the drug are first defined. Then, responses to the drug, coupled with pharmacokinetic information, are used as feedback ( dashed lines ) to modify the dosage regimen to achieve optimal ther- apy. For some drugs, active metabolites formed in the body may also need to be taken into account."""
>>> re.findall('FIGURE\s*[0-9]-[0-9]. (.*)',t,re.MULTILINE)
['An empirical approach to the design of a dosage regimen. The effects, both desired and adverse, are monitored after the administration of a dosage regimen of a drug and used to further refine and optimize the regimen through feedback ( dashed line ).', 'A rational approach to the design of a dosage regimen. The pharmacokinetics and pharmacodynam- ics of the drug are first defined. Then, responses to the drug, coupled with pharmacokinetic information, are used as feedback ( dashed lines ) to modify the dosage regimen to achieve optimal ther- apy. For some drugs, active metabolites formed in the body may also need to be taken into account.']`

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM