[英]interpolate number sequence
我正在嘗試填寫不完整的數字列表,找不到任何Python方式來實現。 我有一系列從1到31的天,並且每天都有一個float值。
#dictionnary{day: value}
monthvalues = {1: 1.12, 2: 3.24, 3: 2.23, 5: 2.10, 7: 4.97} etc.. to 31st day
但是我的數據不完整,有些日子還沒到! 因此,我想用這種方式在數學上填補丟失的圖片:
樣本月份1:
{16: 2.00, 18: 4.00}
#==> I want to add to the dictionnary 17: 3.00
樣本月份2:
{10: 2.00, 14: 4.00}
#==> I want to add to the dictionnary 11: 2.25, 12: 2.50, 13: 2.75
聽起來很簡單,但是我有數百萬行要從一個不完整的sql數據庫中處理,並且目前我對xrange()循環迷失了……也許在數學庫中有一個方法,但我找不到它。
謝謝你的幫助!
編輯:我想對數字進行插值,但據我所知,只有numpy / scipy具有此類數學函數,並且im使用與numpy / scipy不兼容的Pypy。
考慮為此使用pandas
, interpolate
方法很容易:
In [502]: import pandas
In [503]: s = pandas.Series({1: 1.12, 2: 3.24, 3: 2.23,5: 2.10,7:4.97}, index=range(1,8))
In [504]: s
Out[504]:
1 1.12
2 3.24
3 2.23
4 NaN
5 2.10
6 NaN
7 4.97
In [505]: s.interpolate()
Out[505]:
1 1.120
2 3.240
3 2.230
4 2.165
5 2.100
6 3.535
7 4.970
並且具有多個缺失值:
In [506]: s2 = pandas.Series({10: 2.00, 14: 4.00},index=range(10,15))
In [507]: s2
Out[507]:
10 2
11 NaN
12 NaN
13 NaN
14 4
In [508]: s2.interpolate()
Out[508]:
10 2.0
11 2.5
12 3.0
13 3.5
14 4.0
如果需要,您可以將其轉換回字典。
In [511]: s2.to_dict()
Out[511]: {10: 2.0, 11: 2.5, 12: 3.0, 13: 3.5, 14: 4.0}
您只需要一些簡單的循環和良好的舊編程邏輯即可。 此邏輯的一個警告是,您需要一個開始和結束編號才能使其正常工作。 我不知道這對您的數據是否有意義,但是插值法要求這樣做。
設定:
# Keeps track of the last "seen" day
lastday=0
# Default 1st day if missing
if 1 not in monthvalues:
monthvalues[1] = 1.23 #you need a default
# Default 31st day if missing
if 31 not in monthvalues:
monthvalues[31] = 1.23 #you need a default
處理:
# Loop from 1 to 31
for thisday in range(1,32):
# If we do not encounter thisday in the monthvalues, then skip and keep looping
if thisday not in monthvalues:
continue
# How far ago was the last day seen?
gap = thisday - lastday
# If the last day was more than 1 ago, it means there is at least one day amis
if gap > 1:
# This is the amount of the last "seen" day
last_amt = monthvalues[lastday]
# this is the difference between the current day and the last day
diff = monthvalues[thisday] - last_amt
# This is how much you want to interpolate per day in-between
amt_per_day = diff/gap
# there is a gap of missing days, let's fill them
# Start at 1 because we start at the day after the last seen day
for n in range(1, gap):
# Fill the missing days with an interpolated value
monthvalues[lastday+n] = last_amt + amt_per_day * n
# For the next iteration of the loop, this is the last seen day.
lastday = thisday
我認為使用scipy的插值方法是一種聰明的方法
首先將您的數據轉換為易於操作的格式:
monthvalue = {1: 1.12, 2: 3.24, 3: 2.23, 5: 2.10, 7: 4.97, 6: 3.10, 10: 3.3}
X = sorted(monthvalue.keys())
Y = [monthvalue[x] for x in X]
然后創建線性插值函數並輸出中間值
# interpolate function
f = interp1d(X, Y, kind='linear')
x_new = range(X[0], X[-1]+1, 1)
for x in x_new:
print "%s: %s" % (x, f(x))
結果:
1: 1.12
2: 3.24
3: 2.23
4: 2.165
5: 2.1
6: 3.1
7: 4.97
8: 4.41333333333
9: 3.85666666667
10: 3.3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.