簡體   English   中英

如何使用特定格式的熊貓從文本文件讀取數據?

[英]How to read data from text file using pandas in a specific format?

我有一個文本文件,其中包含以下數據。

20/12/2018 
This is the test text. 

22/12/2018
* 21/12/2018 
This is a test text where the text is written on later than the actual date.

現在讓我們說一下,上述數據以及日期都在文本文件(text.txt)中。 我需要一種讀取數據並將其放在熊貓數據框中的方法。 我想將它們讀入列中,

日期文字DateOfWritten

該日期將采用文本的實際日期。 例如,應將21/22/2018作為日期。 並且22/12/2018應該是DateOfWritten

預期的輸出應類似於: 在此處輸入圖片說明

提前致謝。

這可能是一個解決方案

from collections import defaultdict
import pandas as pd

dict_for_df = defaultdict(list)
last_find = None
last_date = None

with open("test.txt",'r') as f:
    for line in f.readlines():
        curr_find = line.find("/")
        if line == "\n":
            continue
        elif curr_find == 2:
            Date = line.replace("\\n","").strip()
            dict_for_df['DateOfWritten'].append(Date)
            last_date = Date
            last_find = 2
        elif (last_find == 2 and  curr_find != 4):
            dict_for_df['Date'].append(last_date)
            dict_for_df['text'].append(line.replace("\n","").strip())
            last_find = 0
            last_date = ''
        elif curr_find == 4:
            dict_for_df['Date'].append(line.replace("*","").replace("\n","").strip())
            last_date = ""
            last_find = None
        else:
            dict_for_df['text'].append(line.replace("\n","").strip())
            last_date = ""
            last_find = None

df =  pd.DataFrame(dict_for_df)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM