[英]Comparing current value to all future values in pandas
我有一个与此类似的 dataframe,即 ~10,000 到 ~100,000 行:
data = [['2000-01-01', 10], ['2000-01-02', 15], ['2000-01-03', 14],
['2000-01-04', 13], ['2000-01-05', 17], ['2000-01-06', 16],
['2000-01-09', 19], ['2000-01-10', 20], ['2000-01-11', 18]]
df = pd.DataFrame(data, columns = ['Date', 'Value'])
像这样创建数据:
日期 | 价值 |
---|---|
2000-01-01 | 10 |
2000-01-02 | 15 |
2000-01-03 | 14 |
2000-01-04 | 13 |
2000-01-05 | 17 |
2000-01-06 | 16 |
2000-01-09 | 19 |
2000-01-10 | 20 |
2000-01-11 | 18 |
我想将每个值与其前面的所有值进行比较,并找到该值等于或低于当前值的最后一个实例。 Output 应如下所示:
日期 | 价值 | 最新日期等于或低于值 |
---|---|---|
2000-01-01 | 10 | 2000-01-01 |
2000-01-02 | 15 | 2000-01-04 |
2000-01-03 | 14 | 2000-01-04 |
2000-01-04 | 13 | 2000-01-04 |
2000-01-05 | 17 | 2000-01-06 |
2000-01-06 | 16 | 2000-01-06 |
2000-01-09 | 19 | 2000-01-11 |
2000-01-10 | 20 | 2000-01-11 |
2000-01-11 | 18 | 2000-01-11 |
任何帮助表示赞赏。
使用pandas.Series.expanding
和idxmin
的一种方法:
s = pd.Series(df["Value"].values,
index=pd.to_datetime(df["Date"]).view(int)).iloc[::-1]
s = s.expanding().apply(lambda x: (x - x.iloc[0]).idxmin())
df["Latest Date"] = pd.to_datetime(s).values[::-1]
Output:
Date Value Latest Date
0 2000-01-01 10 2000-01-01
1 2000-01-02 15 2000-01-04
2 2000-01-03 14 2000-01-04
3 2000-01-04 13 2000-01-04
4 2000-01-05 17 2000-01-06
5 2000-01-06 16 2000-01-06
6 2000-01-09 19 2000-01-11
7 2000-01-10 20 2000-01-11
8 2000-01-11 18 2000-01-11
解释:
将每个元素与其后代进行比较与以相反的顺序expanding
相同。 这就是我做s.iloc[::-1]
的原因。
此外, pandas.Series.expanding
仅当且仅当apply
的结果是数字时才能处理; 因此,使用将用于idxmin
的view(int)
设置索引。
我正在使用这些 collections 的功能,效果很好:
from datetimerange import DateTimeRange
from datetime import datetime, timedelta, timezone
# make datetime from timestamp, thus no timezone info is attached
date = datetime.fromtimestamp(timestamp)
# make local timezone with time.timezone
local_tz = timezone(timedelta(seconds=-time.timezone))
# attach different timezones as you wish
utc_time = datetime.fromtimestamp(timestamp_value).astimezone(timezone.utc)
local_time = datetime.fromtimestamp(timestamp_value).astimezone(local_tz)
print(utc_time.isoformat(timespec='seconds'))
print(local_time.isoformat(timespec='seconds'))
# shift time by adding some minutes or hours
new_time = datetime.fromtimestamp(timestamp_value).astimezone(local_tz) + timedelta(minutes=30)
# This is interesting features of datetime
# check whether a date inside a time range
time_range = DateTimeRange("2020-03-22 10:00:00+0900", "2025-03-22T10:10:00+0900")
print("2022-03-22T10:05:00+0900" in time_range)
print("2042-03-22T10:15:00+0900" in time_range)
# check a time range if inside another time range.
time_range2 = DateTimeRange("2021-03-22T10:03:00+0900", "2022-03-22T10:07:00+0900")
print(time_range2 in time_range)
def calculate(df, x):
date_val, val, index = x
all_values_difference = df.iloc[index:, 1]-val
min_index = all_values_difference.idxmin()
return df.iloc[min_index, 0]
data = [['2000-01-01', 10], ['2000-01-02', 15], ['2000-01-03', 14],
['2000-01-04', 13], ['2000-01-05', 17], ['2000-01-06', 16],
['2000-01-09', 19], ['2000-01-10', 20], ['2000-01-11', 18]]
df = pd.DataFrame(data, columns=['Date', 'Value'])
column_name = 'Latest Date Equal or Below Value'
df[column_name] = range(len(df))
df[column_name] = df.apply(lambda x: calculate(df, x), axis=1)
Output:
Date Value Latest Date Equal or Below Value
0 2000-01-01 10 2000-01-01
1 2000-01-02 15 2000-01-04
2 2000-01-03 14 2000-01-04
3 2000-01-04 13 2000-01-04
4 2000-01-05 17 2000-01-06
5 2000-01-06 16 2000-01-06
6 2000-01-09 19 2000-01-11
7 2000-01-10 20 2000-01-11
8 2000-01-11 18 2000-01-11
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.