简体   繁体   English

Python / Pandas:最快的设置和检索数据的方式,无需链接测定

[英]Python / pandas: Fastest way to set and retrieve data, without chained assaigment

I am doing som routines that acces scalars and vectors from a pandas dataframe, and then sets the results after some calculations. 我正在执行som例程,该例程从熊猫数据帧访问标量和向量,然后在进行一些计算后设置结果。

Initially I used the form df[var][index] to do this, but encountered problems with chained assaignment ( http://pandas.pydata.org/pandas-docs/dev/indexing.html%23indexing-view-versus-copy ) 最初,我使用df [var] [index]的形式来执行此操作,但是遇到链式宣判问题( http://pandas.pydata.org/pandas-docs/dev/indexing.html%23indexing-view-versus-copy

So I change it to use the df.loc[index,var]. 因此,我将其更改为使用df.loc [index,var]。 Which solved the view/copy problem but it is very slow. 这解决了视图/复制问题,但是速度很慢。 For arrays I convert it to a pandas series and uses the builtin df.update(). 对于数组,我将其转换为pandas系列,并使用内置的df.update()。 I am now searching for the fastest/best way of doing this, without having to worry about chained assaingment. 我现在正在寻找最快/最好的方法来执行此操作,而不必担心连锁分析。 In the documentation they say that for example df.at[] is the quickest way to access scalars. 他们在文档中说,例如df.at []是访问标量的最快方法。 Does anyone have any experience with this ? 有人对这个有经验么 ? Or can point at some literature that can help ? 还是可以指出一些可以提供帮助的文献?

Thanks 谢谢

Edit: Code looks like this, which I think is pretty standard. 编辑:代码看起来像这样,我认为这很标准。

    def set_var(self,name,periode,value):
        try:
            if navn.upper() not in self.data:
                 self.data[name.upper()]=num.NaN 
            self.data.loc[periode,name.upper()]=value
        except:
            print('Fail to set'+navn])

    def get_var(self,navn,periode):
    ''' Get value '''
    try:
        value=self.data.loc[periode,navn.upper()]


    def set_series(data, index):
        outputserie=pd.Series(data,index)
        self.data.update(outputserie) 


dataframe looks like this:
  SC0.data
  <class 'pandas.core.frame.DataFrame'>
  PeriodIndex: 148 entries, 1980Q1 to 2016Q4
  Columns: 3111 entries, CAP1 to CHH_DRD
  dtypes: float64(3106), int64(2), object(3)

edit2: 编辑2:

a df could look like df可能看起来像

               var     var1
      2012Q4  0.462015  0.01585
      2013Q1  0.535161  0.01577
      2013Q2  0.735432  0.01401
      2013Q3  0.845959  0.01638
      2013Q4  0.776809  0.01657
      2014Q1  0.000000  0.01517
      2014Q2  0.000000  0.01593

and I basically want to perform two operations: 我基本上想执行两个操作:

1) perhaps update var1 with the same scalar over all periodes 1)可能在所有期间都用相同的标量更新var1

2) solve var in 2014Q1 as var,2013Q4 = var1,2013Q3/var2013Q4*var,2013Q4 2)将2014Q1中的var解析为var,2013Q4 = var1,2013Q3 / var2013Q4 * var,2013Q4

This is done as part of a bigger model setup, which is read from a txt file. 这是从txt文件读取的更大模型设置的一部分。 Since I doing loads of these calculations, the speed og setting and reading data matter 由于我要进行这些计算,因此速度设置和读取数据很重要

The example you gave above can be vectorized. 您上面给出的示例可以向量化。

In [3]: df = DataFrame(dict(A = np.arange(10), B = np.arange(10)),index=pd.period_range('2012',freq='Q',periods=10))

In [4]: df
Out[4]: 
        A  B
2012Q1  0  0
2012Q2  1  1
2012Q3  2  2
2012Q4  3  3
2013Q1  4  4
2013Q2  5  5
2013Q3  6  6
2013Q4  7  7
2014Q1  8  8
2014Q2  9  9

Assign a scalar 分配标量

In [5]: df['A'] = 5

In [6]: df
Out[6]: 
        A  B
2012Q1  5  0
2012Q2  5  1
2012Q3  5  2
2012Q4  5  3
2013Q1  5  4
2013Q2  5  5
2013Q3  5  6
2013Q4  5  7
2014Q1  5  8
2014Q2  5  9

Perform a shifted operation 执行轮班操作

In [8]: df['C'] = df['B'].shift()/df['B'].shift(2)

In [9]: df
Out[9]: 
        A  B         C
2012Q1  5  0       NaN
2012Q2  5  1       NaN
2012Q3  5  2       inf
2012Q4  5  3  2.000000
2013Q1  5  4  1.500000
2013Q2  5  5  1.333333
2013Q3  5  6  1.250000
2013Q4  5  7  1.200000
2014Q1  5  8  1.166667
2014Q2  5  9  1.142857

Using a vectorized assignment 使用向量化分配

In [10]: df.loc[df['B']>5,'D'] = 'foo'

In [11]: df
Out[11]: 
        A  B         C    D
2012Q1  5  0       NaN  NaN
2012Q2  5  1       NaN  NaN
2012Q3  5  2       inf  NaN
2012Q4  5  3  2.000000  NaN
2013Q1  5  4  1.500000  NaN
2013Q2  5  5  1.333333  NaN
2013Q3  5  6  1.250000  foo
2013Q4  5  7  1.200000  foo
2014Q1  5  8  1.166667  foo
2014Q2  5  9  1.142857  foo

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM