简体   繁体   English

将 txt 文件中的特定部分写入 dataframe

[英]Writing specific sections from txt file to dataframe

I have a text file of online physician recommendations.我有一个在线医生推荐的文本文件。 In this file, there are questions asked by each patient and answers given by the physician.在此文件中,有每位患者提出的问题和医生给出的答案。 In addition, the question id and links are also shared.此外,问题 id 和链接也被共享。 I'm focusing only on certain headers from a text file and I want to export these fields to a pandas dataframe.我只关注文本文件中的某些标题,我想将这些字段导出到 pandas dataframe。 txt file as below: txt 文件如下:

Description Q. What should I do to get glowing fair skin?
        
Dialogue 
Patient: Hi doctor, My face and body color are different. My face is getting dark and black day-by-day. What should I do to increase my fairness and get glowing skin? Shall I use some night cream? 
Doctor: Hi. I have read your problem carefully. What is your nature of work? Will you get exposed to sunlight? ... Take care. 
    
id=13585 https://www.icliniq.com/qa/migraine-headaches/how-can-i-cure-one-sided-headache-with-vomiting
    
Description Q. How can I cure one sided headache with vomiting?
    
Dialogue
Patient: Hi doctor, I have one side headache. How to cure it? Will you please give your suggestion? During maigrane headache, I have vomiting. 
Doctor: Hello. You seem to be suffering from migraine.   
    
id=13586 https://www.icliniq.com/qa/diarrhea/what-is-causing-diarrhea-after-eating-a-spicy-dish

From this file, I only want to get the words of the patients and the answers of the doctors.从这个文件中,我只想得到病人的话和医生的回答。 Output Like this: Output 像这样:

    Patient                                             Doctor
0   I am 26 years old. I just found out that I hav...   Hi. For further information consult a ...
1   I am a 46 year old male. My weight is 75 kg an...   Hi. I understand your problem. Revert ... 
2   Since five days, I am having non-radiating che...   ECG and chest x-ray. For further infor...

I want to delete other parts in text (description and id lines).我想删除文本中的其他部分(描述和 ID 行)。 I just want to keep the patient and doctor sections in the dataframe.我只想将患者和医生部分保留在 dataframe 中。 How can I do that?我怎样才能做到这一点? Thank you for your help.谢谢您的帮助。

you can use regex for matching pattern of dialouges and then you can trim them i wrote a little bit messy code for instance你可以使用正则表达式来匹配对话的模式,然后你可以修剪它们我写了一些乱七八糟的代码,例如

import re
text = """Description Q. What should I do to get glowing fair skin?
        
Dialogue 
Patient: Hi doctor, My face and body color are different. My face is getting dark and black day-by-day. What should I do to increase my fairness and get glowing skin? Shall I use some night cream? 
Doctor: Hi. I have read your problem carefully. What is your nature of work? Will you get exposed to sunlight? ... Take care. 
    
id=13585 https://www.icliniq.com/qa/migraine-headaches/how-can-i-cure-one-sided-headache-with-vomiting
    
Description Q. How can I cure one sided headache with vomiting?
    
Dialogue
Patient: Hi doctor, I have one side headache. How to cure it? Will you please give your suggestion? During maigrane headache, I have vomiting. 
Doctor: Hello. You seem to be suffering from migraine.   
    
id=13586 https://www.icliniq.com/qa/diarrhea/what-is-causing-diarrhea-after-eating-a-spicy-dish"""

matchespatient = re.finditer("Patient: (.*?)\n", text, re.DOTALL)
matchesdoctor = re.finditer("Doctor: (.*?)\n", text, re.DOTALL)

zip_object = zip(matchespatient, matchesdoctor)
cnt = 0

print("\tPatient\t\t\t\t\t\t\tDoctor")
for i,j in zip_object:
    pat = i.group(0)[9:-1]
    doc = j.group(0)[8:-1]
    print(f"{cnt}\t{pat[:46]}...\t{doc[:46]}...")
    cnt+=1

output of this code is like this这段代码的output是这样的

        Patient                                                 Doctor
0       Hi doctor, My face and body color are differen...       Hi. I have read your problem carefully. What i...
1       Hi doctor, I have one side headache. How to cu...       Hello. You seem to be suffering from migraine....

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM