簡體   English   中英

在同一列中使用小時和日期格式化文件

[英]formating file with hours and date in the same column

我們的電力供應商認為讓他們提供的 csv 文件難以閱讀會很有趣。

這是精確的耗電量,每 30 分鍾一次,但在 SAME 列中有時間和日期,例如:

[編輯:這里是 csv 文件的原始版本,我的錯]

;
"Récapitulatif de mes puissances atteintes en W";
;
"Date et heure de relève par le distributeur";"Puissance atteinte (W)"
;
"19/11/2022";
"00:00:00";4494
"23:30:00";1174
"23:00:00";1130
[...]
"01:30:00";216
"01:00:00";2672
"00:30:00";2816
;
"18/11/2022";
"00:00:00";4494
"23:30:00";1174
"23:00:00";1130
[...]
"01:30:00";216
"01:00:00";2672
"00:30:00";2816

我他媽的怎么能得到這種可愛的格式化文件:

2022-11-19 00:00:00 2098
2022-11-19 23:30:00 218
2022-11-19 23:00:00 606

等等

嘗試:

import pandas as pd

current_date = None
all_data = []
with open("your_file.txt", "r") as f_in:
    # skip first 5 rows (header)
    for _ in range(5):
        next(f_in)

    for row in map(str.strip, f_in):
        row = row.replace('"', "")
        if row == "":
            continue
        if "/" in row:
            current_date = row
        else:
            all_data.append([current_date, *row.split(";")])

df = pd.DataFrame(all_data, columns=["Date", "Time", "Value"])
print(df)

印刷:

           Date      Time Value
0   19/11/2022;  00:00:00  4494
1   19/11/2022;  23:30:00  1174
2   19/11/2022;  23:00:00  1130
3   19/11/2022;  01:30:00   216
4   19/11/2022;  01:00:00  2672
5   19/11/2022;  00:30:00  2816
6   18/11/2022;  00:00:00  4494
7   18/11/2022;  23:30:00  1174
8   18/11/2022;  23:00:00  1130
9   18/11/2022;  01:30:00   216
10  18/11/2022;  01:00:00  2672
11  18/11/2022;  00:30:00  2816

好吧,我有一個愚蠢的 brutforce 解決方案給你,所以不要把它當作編碼推薦,而只是完成工作的東西:

import itertools
dList = [f"{f}/{s}/2022" for f, s in itertools.product(range(1, 32), range(1, 13))]

我假設你有一個文本文件,所以我將使用它:

file = 'yourfilename.txt'
#make sure youre running the program in the same directory as the .txt file
with open(file, "r") as f:
    global lines
    lines = f.readlines()
lines = [word.replace('\n','') for word in lines]
for i in lines:
    if i in dList:
        curD = i
    else:
        with open('output.txt', 'w') as g:
            g.write(f'{i} {(i.split())[0]} {(i.split())[1]}')

確保在同一目錄中創建一個名為 output.txt 的文件,所有內容都將寫入該文件。

使用 pandas 操作將如下所示:

data.csv

19/11/2022  
00:00:00    2098
23:30:00    218
23:00:00    606
01:30:00    216
01:00:00    2672
00:30:00    2816
18/11/2022  
00:00:00    1994
23:30:00    260
23:00:00    732
01:30:00    200
01:00:00    1378
00:30:00    2520
17/11/2022  
00:00:00    1830
23:30:00    96
23:00:00    122
01:30:00    694
01:00:00    2950
00:30:00    3062
16/11/2022  
00:00:00    2420
23:30:00    678
23:00:00    644

Implementation

import pandas as pd
df = pd.read_csv('data.csv', header=None)
df['amount'] = df[0].apply(lambda item:item.split(' ')[-1] if item.find(':')>0 else None)
df['time'] = df[0].apply(lambda item:item.split(' ')[0] if item.find(':')>0 else None)
df['date'] = df[0].apply(lambda item:item if item.find('/')>0 else None)
df['date'] = df['date'].fillna(method='ffill')
df = df.dropna(subset=['amount'], how='any')
df = df.drop(0, axis=1)
print(df)

output

   amount      time          date
1    2098  00:00:00  19/11/2022  
2     218  23:30:00  19/11/2022  
3     606  23:00:00  19/11/2022  
4     216  01:30:00  19/11/2022  
5    2672  01:00:00  19/11/2022  
6    2816  00:30:00  19/11/2022  
8    1994  00:00:00  18/11/2022  
9     260  23:30:00  18/11/2022  
10    732  23:00:00  18/11/2022  
11    200  01:30:00  18/11/2022  
12   1378  01:00:00  18/11/2022  
13   2520  00:30:00  18/11/2022  
15   1830  00:00:00  17/11/2022  
16     96  23:30:00  17/11/2022  
17    122  23:00:00  17/11/2022  
18    694  01:30:00  17/11/2022  
19   2950  01:00:00  17/11/2022  
20   3062  00:30:00  17/11/2022  
22   2420  00:00:00  16/11/2022  
23    678  23:30:00  16/11/2022  
24    644  23:00:00  16/11/2022 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM