Python CSV文件，比較時間段

Question

鏈接到CSV文件： https : //www.emcsg.com/marketdata/priceinformation [我下載了72個期間的CSV文件。 每天下午12點-UTC + 08：00，都會有一個新文件，其中顯示當天的價格以及直到第二天凌晨12點的價格預測。]

我正在嘗試顯示能量（USEP / $ MWh）低於每天平均值的日期和時間。

for line in lines:
    try:
        time = line.split(",")[1][1:-1]
        interval = line.split(",")[0][1:-1]
        item = line.split(",")[4][1:-1] #Choose 4th column and delete ""
        if interval == next_day:
          if float(item) < average:
              print interval, time, float(item)

    except:
        pass           #If it can't parse, the string is not a number

上面的代碼打印出這樣的內容

30 Sep 2017 01:00-01:30 84.14
30 Sep 2017 01:30-02:00 84.12
30 Sep 2017 02:00-02:30 85.11
30 Sep 2017 02:30-03:00 83.49
30 Sep 2017 03:00-03:30 80.66
30 Sep 2017 03:30-04:00 75.69
30 Sep 2017 04:00-04:30 72.45
         .
         .  
         .
30 Sep 2017 21:30-22:00 79.72
30 Sep 2017 22:00-22:30 73.23
30 Sep 2017 22:30-23:00 73.58
30 Sep 2017 23:00-23:30 72.14
30 Sep 2017 23:30-00:00 85.21

它顯示了能源價格低於9月30日平均值的日期和時間。

但我想打印類似的東西

30 Sep 2017 01:00-04:30
30 Sep 2017 21:30-00:00

基本上，我想將它們分組，因為時間是連續的。 一旦出現中斷（在此期間，價格高於平均水平），當價格低於平均水平時，它將打印出帶有下一個“期間”的新行。

我正在考慮將每個“期間”的結束時間（例如01：00-01：30，01：30是結束時間）與下一個時段的開始時間（例如01：30-02：00，01：下一行是30的開始時間，但我不確定是否可行。

先感謝您！（：

Answer 1

這一定是我很長一段時間以來最丑的代碼之一。 但是也許您是說這樣的話？ 經過一些思考，這可能可以直接用熊貓來完成。

import pandas as pd

url = "https://www.emcsg.com/marketdata/priceinformation?downloadRealtime=true"
df = pd.read_csv(url)

average = df["USEP($/MWh)"].mean()
output = []
entry = 0
old = None

# Starts a loop 
# (if average changes from bigger to lower or vice versa 
# create new entry in the output list)
for k,v in df.iterrows():  

    # First entry
    if not old:
        output.append([])
        output[entry].append(v["Period"])
        if v["USEP($/MWh)"] > average:
            old = "bigger"
            output[0].append(old)
        else:
            old = "smaller"
            output[entry].append(old)
        output[entry].append(v["USEP($/MWh)"])
        continue

    # The rest
    if v["USEP($/MWh)"] > average:
        new = "bigger"
    else:
        new = "smaller"

    if new == old:
        output[entry][0] = output[entry][0].split("-")[0]+"-"+v["Period"].split("-")[1]
        output[entry][2] += v["USEP($/MWh)"]
    else:
        entry += 1
        output.append([])
        output[entry].append(v["Period"])
        output[entry].append(new)
        output[entry].append(v["USEP($/MWh)"])

    old = new

輸出看起來像：

[['12:00-15:30', 'bigger', 503.52],
 ['15:30-18:30', 'smaller', 423.78],
 ['18:30-00:00', 'bigger', 839.39],
 ['00:00-10:00', 'smaller', 1372.4700000000003],
 ['10:00-11:30', 'bigger', 215.90999999999997],
 ['11:30-13:00', 'smaller', 211.83000000000004],
 ['13:00-17:00', 'bigger', 576.4200000000001],
 ['17:00-20:30', 'smaller', 486.94],
 ['20:30-22:00', 'bigger', 227.11],
 ['22:00-00:00', 'smaller', 271.34000000000003]]

Python CSV文件，比較時間段

問題描述

1 個解決方案

解決方案1
1 2017-10-01 12:23:55

Python CSV文件，比較時間段

問題描述

1 個解決方案

解決方案1 1 2017-10-01 12:23:55

解決方案1
1 2017-10-01 12:23:55