[英]formating file with hours and date in the same column
我們的電力供應商認為讓他們提供的 csv 文件難以閱讀會很有趣。
這是精確的耗電量,每 30 分鍾一次,但在 SAME 列中有時間和日期,例如:
[編輯:這里是 csv 文件的原始版本,我的錯]
;
"Récapitulatif de mes puissances atteintes en W";
;
"Date et heure de relève par le distributeur";"Puissance atteinte (W)"
;
"19/11/2022";
"00:00:00";4494
"23:30:00";1174
"23:00:00";1130
[...]
"01:30:00";216
"01:00:00";2672
"00:30:00";2816
;
"18/11/2022";
"00:00:00";4494
"23:30:00";1174
"23:00:00";1130
[...]
"01:30:00";216
"01:00:00";2672
"00:30:00";2816
我他媽的怎么能得到這種可愛的格式化文件:
2022-11-19 00:00:00 2098
2022-11-19 23:30:00 218
2022-11-19 23:00:00 606
等等
嘗試:
import pandas as pd
current_date = None
all_data = []
with open("your_file.txt", "r") as f_in:
# skip first 5 rows (header)
for _ in range(5):
next(f_in)
for row in map(str.strip, f_in):
row = row.replace('"', "")
if row == "":
continue
if "/" in row:
current_date = row
else:
all_data.append([current_date, *row.split(";")])
df = pd.DataFrame(all_data, columns=["Date", "Time", "Value"])
print(df)
印刷:
Date Time Value
0 19/11/2022; 00:00:00 4494
1 19/11/2022; 23:30:00 1174
2 19/11/2022; 23:00:00 1130
3 19/11/2022; 01:30:00 216
4 19/11/2022; 01:00:00 2672
5 19/11/2022; 00:30:00 2816
6 18/11/2022; 00:00:00 4494
7 18/11/2022; 23:30:00 1174
8 18/11/2022; 23:00:00 1130
9 18/11/2022; 01:30:00 216
10 18/11/2022; 01:00:00 2672
11 18/11/2022; 00:30:00 2816
好吧,我有一個愚蠢的 brutforce 解決方案給你,所以不要把它當作編碼推薦,而只是完成工作的東西:
import itertools
dList = [f"{f}/{s}/2022" for f, s in itertools.product(range(1, 32), range(1, 13))]
我假設你有一個文本文件,所以我將使用它:
file = 'yourfilename.txt'
#make sure youre running the program in the same directory as the .txt file
with open(file, "r") as f:
global lines
lines = f.readlines()
lines = [word.replace('\n','') for word in lines]
for i in lines:
if i in dList:
curD = i
else:
with open('output.txt', 'w') as g:
g.write(f'{i} {(i.split())[0]} {(i.split())[1]}')
確保在同一目錄中創建一個名為 output.txt 的文件,所有內容都將寫入該文件。
使用 pandas 操作將如下所示:
data.csv
19/11/2022
00:00:00 2098
23:30:00 218
23:00:00 606
01:30:00 216
01:00:00 2672
00:30:00 2816
18/11/2022
00:00:00 1994
23:30:00 260
23:00:00 732
01:30:00 200
01:00:00 1378
00:30:00 2520
17/11/2022
00:00:00 1830
23:30:00 96
23:00:00 122
01:30:00 694
01:00:00 2950
00:30:00 3062
16/11/2022
00:00:00 2420
23:30:00 678
23:00:00 644
Implementation
import pandas as pd
df = pd.read_csv('data.csv', header=None)
df['amount'] = df[0].apply(lambda item:item.split(' ')[-1] if item.find(':')>0 else None)
df['time'] = df[0].apply(lambda item:item.split(' ')[0] if item.find(':')>0 else None)
df['date'] = df[0].apply(lambda item:item if item.find('/')>0 else None)
df['date'] = df['date'].fillna(method='ffill')
df = df.dropna(subset=['amount'], how='any')
df = df.drop(0, axis=1)
print(df)
output
amount time date
1 2098 00:00:00 19/11/2022
2 218 23:30:00 19/11/2022
3 606 23:00:00 19/11/2022
4 216 01:30:00 19/11/2022
5 2672 01:00:00 19/11/2022
6 2816 00:30:00 19/11/2022
8 1994 00:00:00 18/11/2022
9 260 23:30:00 18/11/2022
10 732 23:00:00 18/11/2022
11 200 01:30:00 18/11/2022
12 1378 01:00:00 18/11/2022
13 2520 00:30:00 18/11/2022
15 1830 00:00:00 17/11/2022
16 96 23:30:00 17/11/2022
17 122 23:00:00 17/11/2022
18 694 01:30:00 17/11/2022
19 2950 01:00:00 17/11/2022
20 3062 00:30:00 17/11/2022
22 2420 00:00:00 16/11/2022
23 678 23:30:00 16/11/2022
24 644 23:00:00 16/11/2022
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.