简体   繁体   English

Pandas Dataframe 复杂计算

[英]Pandas Dataframe Complex Calculation

I have the following dataframe,df:我有以下数据框,df:

     Year  totalPubs  ActualCitations
0   1994         71       191.002034
1   1995         77      2763.911781
2   1996         69      2022.374474
3   1997         78      3393.094951

I want to write code that would do the following:我想编写可以执行以下操作的代码:

Citations of currentyear / Sum of totalPubs of the two previous years当前年份的引用次数 / 前两年的 totalPubs 总和

I want something to create a new column called Impact Factor, and generate it as follows:我想要创建一个名为 Impact Factor 的新列,并按如下方式生成它:

for index, row in df.iterrows():
    if row[0]>=1996:
        df.at[index,'Impact Factor'] = df.at[index, 'ActualCitations'] / (df.at[index-1, 'totalPubs'] + df.at[index-2, 'totalPubs'])

I believe the following does what you want:我相信以下可以满足您的需求:

In [24]:
df['New_Col'] = df['ActualCitations']/pd.rolling_sum(df['totalPubs'].shift(), window=2)
df

Out[24]:
   Year  totalPubs  ActualCitations    New_Col
0  1994         71       191.002034        NaN
1  1995         77      2763.911781        NaN
2  1996         69      2022.374474  13.664692
3  1997         78      3393.094951  23.240376

So the above uses rolling_sum and shift to generate the previous 2 years sum and we then divide the citations value by that value.所以上面使用rolling_sumshift来生成前 2 年的总和,然后我们将引用值除以该值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM