[英]Pandas interpolate data with units
嗨,大家好,
我一直在尋找Stackoverflow幾年,它幫助了我很多,以至於我以前從來沒有注冊過:)
但今天我堅持使用Python與Pandas和數量的問題(可能是unum或品脫)。 我盡力做一個明確的帖子,但由於這是我的第一個,我道歉,如果有些令人困惑,並會嘗試糾正你會發現的任何錯誤:)
我想從源導入數據並構建Pandas數據幀,如下所示:
import pandas as pd
import quantities as pq
depth = [0.0,1.1,2.0] * pq.m
depth2 = [0,1,1.1,1.5,2] * pq.m
s1 = pd.DataFrame(
{'depth' : [x for x in depth]},
index = depth)
這給出了:
S1=
depth
0.0 0.0 m
1.1 1.1 m
2.0 2.0 m
現在我想將數據擴展到depth2值:(顯然沒有指向深度插入深度,但在它變得更復雜之前它是一個測試)。
s2 = s1.reindex(depth2)
這給出了:
S2=
depth
0.0 0.0 m
1.0 NaN
1.1 1.1 m
1.5 NaN
2.0 2.0 m
到目前為止沒問題。
但是當我嘗試插入缺失的值時:
s2['depth'].interpolate(method='values')
我收到以下錯誤:
C:\Python27\lib\site-packages\numpy\lib\function_base.pyc in interp(x, xp, fp, left, right)
1067 return compiled_interp([x], xp, fp, left, right).item()
1068 else:
-> 1069 return compiled_interp(x, xp, fp, left, right)
1070
1071
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
我明白從numpy插值不適用於對象。
但是,如果我現在嘗試通過刪除單位來插入缺失值,它可以工作:
s3 = s2['depth'].astype(float).interpolate(method='values')
這給出了:
s3 =
0.0 0
1.0 1
1.1 1.1
1.5 1.5
2.0 2
Name: depth, dtype: object
如何在深度列中取回裝置?
我找不到任何把它放回去的技巧......
任何幫助將不勝感激。 謝謝
這是一種做你想做的事的方法。
拆分數量並為每個數量創建一組2列
In [80]: df = concat([ col.apply(lambda x: Series([x.item(),x.dimensionality.string],
index=[c,"%s_unit" % c])) for c,col in s1.iteritems() ])
In [81]: df
Out[81]:
depth depth_unit
0.0 0.0 m
1.1 1.1 m
2.0 2.0 m
In [82]: df = df.reindex([0,1.0,1.1,1.5,2.0])
In [83]: df
Out[83]:
depth depth_unit
0.0 0.0 m
1.0 NaN NaN
1.1 1.1 m
1.5 NaN NaN
2.0 2.0 m
插
In [84]: df['depth'] = df['depth'].interpolate(method='values')
傳播單位
In [85]: df['depth_unit'] = df['depth_unit'].ffill()
In [86]: df
Out[86]:
depth depth_unit
0.0 0.0 m
1.0 1.0 m
1.1 1.1 m
1.5 1.5 m
2.0 2.0 m
好的我找到了一個解決方案,可能不是最好的解決方案,但對於我的問題它可以正常工作:
import pandas as pd
import quantities as pq
def extendAndInterpolate(input, newIndex):
""" Function to extend a panda dataframe and interpolate
"""
output = pd.concat([input, pd.DataFrame(index=newIndex)], axis=1)
for col in output.columns:
# (1) Try to retrieve the unit of the current column
try:
# if it succeeds, then store the unit
unit = 1 * output[col][0].units
except Exception, e:
# if it fails, which means that the column contains string
# then return 1
unit = 1
# (2) Check the type of value.
if isinstance(output[col][0], basestring):
# if it's a string return the string and fill the missing cell with this string
value = output[col].ffill()
else:
# if it's a value, to be able to interpolate, you need to:
# - (a) dump the unit with astype(float)
# - (b) interpolate the value
# - (c) add again the unit
value = [x*unit for x in output[col].astype(float).interpolate(method='values')]
#
# (3) Returned the extended pandas table with the interpolated values
output[col] = pd.Series(value, index=output.index)
# Return the output dataframe
return output
然后:
depth = [0.0,1.1,2.0] * pq.m
depth2 = [0,1,1.1,1.5,2] * pq.m
s1 = pd.DataFrame(
{'depth' : [x for x in depth]},
index = depth)
s2 = extendAndInterpolate(s1, depth2)
結果:
s1
depth
0.0 0.0 m
1.1 1.1 m
2.0 2.0 m
s2
depth
0.0 0.0 m
1.0 1.0 m
1.1 1.1 m
1.5 1.5 m
2.0 2.0 m
謝謝你的幫助。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.