繁体   English   中英

将当前值与 pandas 中的所有未来值进行比较

[英]Comparing current value to all future values in pandas

我有一个与此类似的 dataframe,即 ~10,000 到 ~100,000 行:

data = [['2000-01-01', 10], ['2000-01-02', 15], ['2000-01-03', 14], 
    ['2000-01-04', 13], ['2000-01-05', 17], ['2000-01-06', 16],
    ['2000-01-09', 19], ['2000-01-10', 20], ['2000-01-11', 18]]


df = pd.DataFrame(data, columns = ['Date', 'Value'])

像这样创建数据:

日期 价值
2000-01-01 10
2000-01-02 15
2000-01-03 14
2000-01-04 13
2000-01-05 17
2000-01-06 16
2000-01-09 19
2000-01-10 20
2000-01-11 18

我想将每个值与其前面的所有值进行比较,并找到该值等于或低于当前值的最后一个实例。 Output 应如下所示:

日期 价值 最新日期等于或低于值
2000-01-01 10 2000-01-01
2000-01-02 15 2000-01-04
2000-01-03 14 2000-01-04
2000-01-04 13 2000-01-04
2000-01-05 17 2000-01-06
2000-01-06 16 2000-01-06
2000-01-09 19 2000-01-11
2000-01-10 20 2000-01-11
2000-01-11 18 2000-01-11

任何帮助表示赞赏。

使用pandas.Series.expandingidxmin的一种方法:

s = pd.Series(df["Value"].values, 
              index=pd.to_datetime(df["Date"]).view(int)).iloc[::-1]
s = s.expanding().apply(lambda x: (x - x.iloc[0]).idxmin())
df["Latest Date"] = pd.to_datetime(s).values[::-1]

Output:

         Date  Value Latest Date
0  2000-01-01     10  2000-01-01
1  2000-01-02     15  2000-01-04
2  2000-01-03     14  2000-01-04
3  2000-01-04     13  2000-01-04
4  2000-01-05     17  2000-01-06
5  2000-01-06     16  2000-01-06
6  2000-01-09     19  2000-01-11
7  2000-01-10     20  2000-01-11
8  2000-01-11     18  2000-01-11

解释:

将每个元素与其后代进行比较与以相反的顺序expanding相同。 这就是我做s.iloc[::-1]的原因。

此外, pandas.Series.expanding仅当且仅当apply的结果是数字时才能处理; 因此,使用将用于idxminview(int)设置索引。

我正在使用这些 collections 的功能,效果很好:

from datetimerange import DateTimeRange
from datetime import datetime, timedelta, timezone

# make datetime from timestamp, thus no timezone info is attached
date = datetime.fromtimestamp(timestamp)

# make local timezone with time.timezone
local_tz = timezone(timedelta(seconds=-time.timezone))

# attach different timezones as you wish
utc_time = datetime.fromtimestamp(timestamp_value).astimezone(timezone.utc)
local_time = datetime.fromtimestamp(timestamp_value).astimezone(local_tz)
print(utc_time.isoformat(timespec='seconds')) 
print(local_time.isoformat(timespec='seconds'))

# shift time by adding some minutes or hours
new_time = datetime.fromtimestamp(timestamp_value).astimezone(local_tz) + timedelta(minutes=30)
        
# This is interesting features of datetime
# check whether a date inside a time range
time_range = DateTimeRange("2020-03-22 10:00:00+0900", "2025-03-22T10:10:00+0900")
print("2022-03-22T10:05:00+0900" in time_range)
print("2042-03-22T10:15:00+0900" in time_range)

# check a time range if inside another time range.
time_range2 = DateTimeRange("2021-03-22T10:03:00+0900", "2022-03-22T10:07:00+0900")
print(time_range2 in time_range)

def calculate(df, x):
    date_val, val, index = x
    all_values_difference = df.iloc[index:, 1]-val
    min_index = all_values_difference.idxmin()
    return df.iloc[min_index, 0]


data = [['2000-01-01', 10], ['2000-01-02', 15], ['2000-01-03', 14],
        ['2000-01-04', 13], ['2000-01-05', 17], ['2000-01-06', 16],
        ['2000-01-09', 19], ['2000-01-10', 20], ['2000-01-11', 18]]


df = pd.DataFrame(data, columns=['Date', 'Value'])
column_name = 'Latest Date Equal or Below Value'
df[column_name] = range(len(df))
df[column_name] = df.apply(lambda x: calculate(df, x), axis=1)

Output:
    Date    Value   Latest Date Equal or Below Value
0   2000-01-01  10  2000-01-01
1   2000-01-02  15  2000-01-04
2   2000-01-03  14  2000-01-04
3   2000-01-04  13  2000-01-04
4   2000-01-05  17  2000-01-06
5   2000-01-06  16  2000-01-06
6   2000-01-09  19  2000-01-11
7   2000-01-10  20  2000-01-11
8   2000-01-11  18  2000-01-11

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM