簡體   English   中英

Python pandas 重采樣問題/誤解

[英]Python pandas resampling issue / misunderstanding

我正在嘗試使用熊貓生成地震序列的歷史記錄。

我的輸入是一個 CSV 文件,如下所示:

ID,DATE,LAT,LON,DEPTH,MAG,LOCALITY
ISTerre2020odcbbh,2020-07-18T23:24:03.616341Z,45.426,6.32499,3.56121,1.56979,"MONTGELLAFREY"
ISTerre2020nsbzaa,2020-07-12T23:32:31.159491Z,45.4239,6.32597,1.79717,0.818867,"MONTGELLAFREY"
ISTerre2020lcxxda,2020-06-06T09:29:45.006351Z,45.4126,6.32702,3.7011,1.58432,"MONTGELLAFREY"
ISTerre2020jppugg,2020-05-15T23:30:27.553768Z,45.4288,6.29128,5.03303,1.0121,"LA CHAPELLE"
ISTerre2020flokvv,2020-03-18T02:46:01.877839Z,45.4134,6.38374,3.06686,1.08096,"SAINT-FRANCOIS-LONGCHAMP"
ISTerre2019znoncu,2019-12-28T11:44:51.242507Z,45.4341,6.33249,7.61996,1.26731,"EPIERRE"

我想在用熊貓獲得的數據框中插入目錄中無論如何缺失的月份或天數(我的意思是沒有地震的天數/月數),以在沒有事件的月份的直方圖中顯示空條。

我嘗試使用 resample('M') 來執行此操作,但它不起作用,我收到此錯誤:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

這是我的腳本示例:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('catalogue.csv')
df.info()
df["DATE"] = df["DATE"].astype("datetime64")
(df["DATE"].groupby([df["DATE"].dt.year, df["DATE"].dt.month]).count()).plot(kind="bar") #to plot the hisotgram with the missing months

from datetime import datetime
from datetime import timedelta
from dateutil import rrule    
    
data1=df.sort_values('DATE').set_index('DATE')  
month_groups_resample = data1['DATE'].resample('M').count()
ax = month_groups_resample.plot(kind='bar',figsize=(10,5),legend=None)

我已經圍繞重新采樣進行了很多不同的測試,但沒有任何成功。 我確信有一種非常簡單的方法可以做到這一點,但我對 python 不夠流利。

希望可以有人幫幫我。

問候邁克爾。

這是一個 matplotlib 唯一的解決方案:

import datetime
from collections import Counter
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

with open("catalog.csv", "r") as _f:
    # skip first header line
    _f.readline()
    dates = [datetime.datetime.strptime(t.split(",")[1][:7], "%Y-%m") for t in _f.readlines()]
count_dict = Counter(dates)
x = count_dict.keys()
y = [count_dict[k] for k in keys]
    
fig, ax1 = plt.subplots(figsize=(9, 7))
ax1.xaxis.set_major_locator(mdates.YearLocator())
ax1.xaxis.set_minor_locator(mdates.MonthLocator())
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%m\n%Y"))
ax1.xaxis.set_minor_formatter(mdates.DateFormatter("%m"))
ax1.yaxis.get_major_locator().set_params(integer=True)
ax1.set_xlabel("months of measurements")
ax1.set_ylabel("count of event")
fig.suptitle("MY MAIN TITLE")
rects = ax1.bar(x, y)
plt.show()

您可以先對日期進行四舍五入,創建一個沒有丟失日期的索引,然后使用這個新索引重新索引完整的數據框

import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime
from datetime import timedelta
from dateutil import rrule

df = pd.read_csv('catalogue.csv')
df.info()
df["DATE"] = df["DATE"].astype("datetime64")

data1 = df.sort_values('DATE').set_index('DATE')
data1.index = data1.index.round(freq='D')

# Index with all the dates
date_range = pd.date_range(
    start=data1.index[0], end=data1.index[-1],
    freq='D', closed='left')

# Fill the original dataframe. By default insert NaNs
data1 = data1.reindex(date_range)

# I'm using ID as representative of the number of events
df_num_events = data1.ID.groupby(level=0).count()
df_num_events.plot()

在此處輸入圖片說明

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM