遍歷 DataFrame 行以在引用其他行的同時創建新列

Question

我正在使用一個大型數據框，其中包含股票的基本數據。 下面是數據幀（數據）的頭部和尾部的圖像。 它擁有 2005 年至 2015 年每個證券和每年的數據。 請注意“日歷”列。

我的目標是轉到每一行，獲取“revenueusd”數據點並將其除以前一年的“revenueusd”數據點，以獲得每種證券的 1 年收入增長變化。 第二個數據點使用股票代碼和日歷日期定位。

我一直在嘗試將 apply 函數與 lambda 一起使用，但它不起作用。 下面是我一直在嘗試的代碼：

def conversion(tick, dates,dataframe):
    date1 = datetime.datetime.strptime(dates, "%Y-%m-%d").date()
    date2 = datetime.date(date1.year-1,date1.month,date1.day).strftime("%Y-%m-%d")
    growth = dataframe[(dataframe['ticker']==tick)&(dataframe['calendardate']==dates)]['revenueusd']/dataframe[(dataframe['ticker']==tick)&(dataframe['calendardate']==date2)]['revenueusd']-1
    return growth

data['1yrRevenueGrowth']=data.apply(lambda x: conversion(x['ticker'],x['calendardate'],data),axis=1)

我已經堅持了幾天並無情地搜索論壇。 任何幫助將不勝感激！

數據頭(5)

數據尾(5)

,ticker,ticker.1,calendardate,revenueusd,gp,rnd  
0,A,A,2015-12-31,4038000000,2041000000,330000000  
1,AA,AA,2015-12-31,22534000000,4465000000,238000000  
2,AAL,AAL,2015-12-31,40990000000,23911000000,0  
3,AAP,AAP,2015-12-31,9737018000,4422772000,0  
4,AAPL,AAPL,2015-12-31,234988000000,94308000000,8576000000  
5,ABBV,ABBV,2015-12-31,22859000000,18359000000,4435000000  
509,A,A,2014-12-31,6981000000,3593000000,719000000  
510,AA,AA,2014-12-31,23906000000,4769000000,218000000  
511,AAPL,AAPL,2014-12-31,199800000000,78432000000,6606000000  
512,ABBV,ABBV,2014-12-31,19960000000,15534000000,3649000000

Answer 1

有一個名為Series.pct_change的好函數可以滿足您的需求。 例如，您可以這樣做：

import pandas as pd
data = pd.read_csv("data.csv", index_col=0)
data.groupby("ticker").apply(lambda x : x.set_index("calendardate").sort_index()["revenueusd"].pct_change())

對於每個股票代碼值，構建一個按日歷日期排序的系列，然后應用函數pct_change （默認情況下，該函數計算兩個連續條目之間的比率）。

ticker  calendardate
A       2014-12-31           NaN
        2015-12-31     -0.421573
AA      2014-12-31           NaN
        2015-12-31     -0.057391
AAL     2015-12-31           NaN
AAP     2015-12-31           NaN
AAPL    2014-12-31           NaN
        2015-12-31      0.176116
ABBV    2014-12-31           NaN
        2015-12-31      0.145240
Name: revenueusd, dtype: float64

還有一件事，您的日期格式很好，因此您可以輕松地將列轉換為日期時間，如下所示：

data["calendardate"] = pd.to_datetime(data["calendardate"], infer_datetime_format=True)

Answer 2

從這個開始：

 ticker ticker.1 calendardate   revenueusd          gp      rnd  
0      A        A   2015-12-31   4038000000  2041000000  330000000
1     AA       AA   2015-12-31  22534000000  4465000000  238000000
2      A        A   2014-12-31    403800000   204100000  330000000
3     AA       AA   2014-12-31   2253400000   446500000  238000000
4      A        A   2013-12-31    403800000    20410000  330000000
5     AA       AA   2013-12-31    225340000    44650000  238000000
6      A        A   2012-12-31       403800     2041000  330000000
7     AA       AA   2012-12-31     22534000     4465000  238000000


df["pct"] =  df.groupby("ticker")['revenueusd'].pct_change()



 ticker ticker.1 calendardate   revenueusd          gp      rnd      pct
0      A        A   2015-12-31   4038000000  2041000000  330000000    NaN
1     AA       AA   2015-12-31  22534000000  4465000000  238000000    NaN
2      A        A   2014-12-31    403800000   204100000  330000000 -0.900
3     AA       AA   2014-12-31   2253400000   446500000  238000000 -0.900
4      A        A   2013-12-31    403800000    20410000  330000000  0.000
5     AA       AA   2013-12-31    225340000    44650000  238000000 -0.900
6      A        A   2012-12-31       403800     2041000  330000000 -0.999
7     AA       AA   2012-12-31     22534000     4465000  238000000 -0.900

您可能需要在應用 groupby 之前對 DataFrame進行排序。

遍歷 DataFrame 行以在引用其他行的同時創建新列

問題描述

2 個解決方案

解決方案1
1 已采納 2016-07-09 19:33:01

解決方案2
0 2016-07-09 19:54:17

遍歷 DataFrame 行以在引用其他行的同時創建新列

問題描述

2 個解決方案

解決方案1 1 已采納 2016-07-09 19:33:01

解決方案2 0 2016-07-09 19:54:17

解決方案1
1 已采納 2016-07-09 19:33:01

解決方案2
0 2016-07-09 19:54:17