简体   繁体   English

比较具有不同日期列的 DataFrame

[英]Comparing DataFrames with different date columns

I have data for rain measurements and water level measurements.我有雨量测量和水位测量数据。 But with different date and time values.但具有不同的日期和时间值。 Let's say I want to compare the data by visualizing it in a subplot figure at the exact same time.假设我想通过在完全相同的时间在子图中可视化来比较数据。 I have tried to do it myself with to diffent dataframes, as seen in the figure: Rain and water level measurements我已经尝试用不同的数据帧自己做,如图所示:雨和水位测量

As seen, the time is shifted in both figures, making it hard to compare the "peaks" according to the same time.如图所示,两个图中的时间都发生了变化,因此很难根据同一时间比较“峰值”。

Is there a way of comparing it by using Pandas DataFrame?有没有办法通过使用 Pandas DataFrame 来比较它? I have tried it myself, using the following code:我自己尝试过,使用以下代码:

import pandas as pd
import matplotlib.pyplot as plt
import pickle

wb = pickle.load(open("data.p","rb"))

rain_period = wb[0]
flow_knudmose = wb[1]


periods = [['20170224','20170819','20170906'],
        ['20170308','20170826','20170917']]

# Period 1
rain_1 = rain_period.loc[(rain_period['Time'] >= periods[0][0]) &(rain_period['Time'] <= periods[1][0]) ]
rain_1.sort_values('Time',ascending=True,inplace=True)

water_1 = flow_knudmose.loc[(flow_knudmose['Time'] >= periods[0][0]) & (flow_knudmose['Time'] <= periods[1][0]) ]
water_1.sort_values('Time',ascending=True,inplace=True)

fig,axes = plt.subplots(nrows=2,ncols=1)
rain_1.plot(color='b',ax = axes[0], x='Time')
water_1.plot(color='r',ax = axes[1], x='Time')
plt.show()

This code made the figure I have attached.这段代码制作了我附上的图。 You can get the data.p pickle here你可以在这里得到data.p pickle

Thanks in advance!提前致谢!

So you have time data that does not match between the two tables and what you want is an "intersection" of the two time data sets.所以你有两个表之间不匹配的时间数据,你想要的是两个时间数据集的“交集”。 Discard time data from either set and create a new, common start and end times:丢弃任一组中的时间数据并创建一个新的、通用的开始和结束时间:

startTime = water_1.iloc[0]['Time'] if water_1.iloc[0]['Time'] >= rain_1.iloc[0]['Time'] else rain_1.iloc[0]['Time']
endTime   = water_1.iloc[-1]['Time'] if water_1.iloc[-1]['Time'] <= rain_1.iloc[-1]['Time'] else rain_1.iloc[-1]['Time']

Create a new dataset within these time limits:在这些时间限制内创建一个新数据集:

rain_2 = rain_1[(rain_1['Time'] >= startTime) & (rain_1['Time'] <= endTime)]
water_2 = water_1[(water_1['Time'] >= startTime) & (water_1['Time'] <= endTime)]

Plot:阴谋:

fig,axes = plt.subplots(nrows=2,ncols=1)
rain_2.plot(color='b',ax = axes[0], x='Time')
water_2.plot(color='r',ax = axes[1], x='Time')
plt.tight_layout()
plt.show()

I hope you find the following code and comments useful:我希望您发现以下代码和注释有用:

import pandas as pd
import matplotlib.pyplot as plt
import pickle

wb = pickle.load(open("data.pickle", "rb"))

rain_period = wb[0]
flow_knudmose = wb[1]

periods = [['20170224','20170819','20170906'],
        ['20170308','20170826','20170917']]

# <dataframe>.copy() are added to avoid a warning about modifying dataframe's view 
# As described at: https://stackoverflow.com/questions/17328655/pandas-set-datetimeindex,
# we can use DatetimeIndex for a new index; old 'Time' column can be dropped afterwards
rain_1 = rain_period.loc[(rain_period['Time'] >= periods[0][0]) & (rain_period['Time'] <= periods[1][0])].copy()
rain_1 = rain_1.set_index(pd.DatetimeIndex(rain_1['Time'])).drop(columns=["Time"]).sort_index()

water_1 = flow_knudmose.loc[(flow_knudmose['Time'] >= periods[0][0]) & (flow_knudmose['Time'] <= periods[1][0])].copy()
water_1 = water_1.set_index(pd.DatetimeIndex(water_1['Time'])).drop(columns=["Time"]).sort_index()

# With sharex=True, the plots show the entire period of time represented by the data in the dataframes,
# rather than the intersection of time periods (in the case with intersection, some important data might not be shown)
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True)

# Without <index>.to_pydatetime(), this code produces an error:  
# "AttributeError: 'numpy.datetime64' object has no attribute 'toordinal'"
axes[0].plot_date(rain_1.index.to_pydatetime(), rain_1["Rain"], '-',
                 color='b', label="Rain");
axes[1].plot_date(water_1.index.to_pydatetime(), water_1["Water Level"], '-',
                  color='r', label="Water Level");

# Set the favorite angle for x-labels and show legends
for ax in axes:
    plt.sca(ax)
    plt.xticks(rotation=45)
    ax.legend(loc="upper right")

plt.show()

Output: produced plot输出:生成的图

The conversion with to_pydatetime() was suggested at: Converting pandas DatetimeIndex to 'float days format' with Matplotlib.dates.datestr2num建议使用to_pydatetime()进行转换Converting pandas DatetimeIndex to 'float days format' with Matplotlib.dates.datestr2num

This solution works for:此解决方案适用于:

python 3.5.4 
pandas 0.21.0
matplotlib 2.1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM