簡體   English   中英

pandas groupby適用於系列,但不適用於選擇整個數據幀

[英]pandas groupby works on series but not on selecting whole dataframe

想了解這種行為。

我有一個Dataframe holdings ,其中包含各種各樣的欄目

[u'date', u'portfolio', u'sector', u'industry', u'instrument', u'name', u'position', u'price', u'pct_chg', u'mv']

其中mv是市場價值。

當我做

holdings['wt'] = holdings.groupby(['holdings.portfolio','holdings.date']).apply(lambda x: x['mv']/sum(x['mv']) )

我收到了錯誤

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in reindexer(value)
   2234 
   2235                     # other
-> 2236                     raise TypeError('incompatible index of inserted column '
   2237                                     'with frame index')
   2238             return value

TypeError: incompatible index of inserted column with frame index

但是,當我這樣做

holdings['wt'] = holdings['mv'].groupby([holdings['holdings.portfolio'],holdings['holdings.date']]).apply(lambda x: x/sum(x) )

它工作正常。

前者看起來對我來說有點整潔。 我的編碼是錯誤的還是預期的? 謝謝


CSV數據轉儲如下:

',holdings.date,holdings.portfolio,static_data.sector,static_data.industry,holdings.instrument,static_data.name,holdings.position,prices.adjclose,pct_chg,mv\n0,2013-01-14 00:00:00,SP500,Health Care,Health Care Equipment & Services,A,Agilent Technologies Inc,333512000.0,30.61,0.0026203734032099746,10208802320.0\n20072,2013-01-14 00:00:00,SP500,Consumer Discretionary,"Apparel, Accessories & Luxury Goods",RL,Polo Ralph Lauren Corp.,87704000.0,163.35,0.002454740718011772,14326448400.0\n3432,2013-01-14 00:00:00,SP500,Information Technology,Semiconductors,BRCM,Broadcom Corporation,592000000.0,33.74,-0.005599764220453829,19974080000.0\n20020,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Drilling,RIG,Transocean,362189000.0,49.65,-0.0028118096003213466,17982683850.0\n19968,2013-01-14 00:00:00,SP500,Information Technology,Systems Software,RHT,Red Hat Inc.,187822000.0,54.99,0.009917355371900749,10328331780.0\n3484,2013-01-14 00:00:00,usequity,Health Care,Health Care Equipment & Services,BSX,Boston Scientific,849000.0,6.32,-0.0062893081761006275,5365680.0\n19916,2013-01-14 00:00:00,usequity,Industrials,Industrial Conglomerates,RHI,Robert Half International,60000.0,32.28,0.011278195488721776,1936800.0\n3536,2013-01-14 00:00:00,SP500,Consumer Discretionary,Auto Parts & Equipment,BWA,BorgWarner,227373000.0,35.57,0.003668171557562161,8087657610.0\n19864,2013-01-14 00:00:00,SP500,Financials,Diversified Financial Services,RF,Regions Financial Corp.,1379000000.0,7.06,-0.007032348804500765,9735740000.0\n19812,2013-01-14 00:00:00,SP500,Health Care,Biotechnology,REGN,Regeneron,100390000.0,179.4,-0.00033433634236046395,18009966000.0\n3588,2013-01-14 00:00:00,SP500,Financials,REITs,BXP,Boston Properties,153099000.0,100.68,0.003388479170819192,15414007320.000002\n19760,2013-01-14 00:00:00,SP500,Consumer Staples,Tobacco,RAI,Reynolds American Inc.,531283000.0,39.13,0.0017921146953405742,20789103790.0\n19708,2013-01-14 00:00:00,SP500,Industrials,Industrial Conglomerates,R,Ryder System,53039000.0,51.47,0.0027274498344047604,2729917330.0\n3640,2013-01-14 00:00:00,SP500,Financials,Banks,C,Citigroup Inc.,3029500000.0,42.15,-0.002838892831795725,127693425000.0\n19656,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Exploration & Production,QEP,QEP Resources,180091000.0,29.17,-0.004776526782667934,5253254470.0\n3692,2013-01-14 00:00:00,SP500,Information Technology,Systems Software,CA,"CA, Inc.",444906000.0,22.19,0.009554140127388644,9872464140.0\n19604,2013-01-14 00:00:00,SP500,Information Technology,Semiconductors,QCOM,QUALCOMM Inc.,1676023000.0,62.05,-0.010208964747168592,103997227150.0\n19552,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Exploration & Production,PXD,Pioneer Natural Resources,143098000.0,111.63,-0.0009844281367460406,15974029740.0\n3744,2013-01-14 00:00:00,SP500,Consumer Staples,Packaged Foods & Meats,CAG,ConAgra Foods Inc.,424827000.0,29.21,0.0075888237323216146,12409196670.0\n19500,2013-01-14 00:00:00,SP500,Materials,Industrial Gases,PX,Praxair Inc.,291372000.0,110.15,0.0009086778736937529,32094625800.0\n19448,2013-01-14 00:00:00,SP500,Industrials,Industrial Conglomerates,PWR,Quanta Services Inc.,216795000.0,28.66,-0.012405237767057153,6213344700.0\n3796,2013-01-14 00:00:00,SP500,Health Care,Health Care Distributors & Services,CAH,Cardinal Health Inc.,336000000.0,41.62,0.003133285128946728,13984320000.0\n19396,2013-01-14 00:00:00,SP500,Consumer Discretionary,"Apparel, Accessories & Luxury Goods",PVH,PVH Corp.,82393000.0,117.49,0.002303361201160259,9680353570.0\n3848,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Equipment & Services,CAM,Cameron International Corp.,198303000.0,57.44,-0.0019113814074717128,11390524320.0\n19344,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Refining & Marketing & Transportation,PSX,Phillips 66,553513000.0,49.42,0.015409903431271799,27354612460.0\n19292,2013-01-14 00:00:00,SP500,Financials,REITs,PSA,Public Storage,172418000.0,139.16,-0.005005005005005114,23993688880.0\n20124,2013-01-14 00:00:00,SP500,Industrials,Industrial Conglomerates,ROK,Rockwell Automation Inc.,137872000.0,82.65,-0.0018115942028984477,11395120800.0\n3900,2013-01-14 00:00:00,SP500,Industrials,Construction & Farm Machinery & Heavy Trucks,CAT,Caterpillar Inc.,611500000.0,90.32,-0.005943209333039934,55230679999.99999\n3380,2013-01-14 00:00:00,SP500,Health Care,Health Care Distributors & Services,BMY,Bristol-Myers Squibb,1658776000.0,32.49,0.00277777777777799,53893632240.0\n3328,2013-01-14 00:00:00,SP500,Materials,Paper Packaging,BMS,Bemis Company,99880000.0,33.34,0.008469449485783542,3329999200.0000005\n21008,2013-01-14 00:00:00,SP500,Consumer Discretionary,Broadcasting & Cable TV,SNI,Scripps Networks Interactive Inc.,140122000.0,58.2,-0.011381009002887632,8155100400.0\n2860,2013-01-14 00:00:00,SP500,Consumer Discretionary,Computer & Electronics Retail,BBY,Best Buy Co. Inc.,349615000.0,13.9,0.019061583577712593,4859648500.0\n20956,2013-01-14 00:00:00,SP500,Information Technology,Computer Storage & Peripherals,SNDK,SanDisk Corporation,222201000.0,46.04,0.008985316677624366,10230134040.0\n20904,2013-01-14 00:00:00,SP500,Consumer Discretionary,Household Appliances,SNA,Snap-On Inc.,58107000.0,77.47,0.0014219234746639664,4501549290.0\n2912,2013-01-14 00:00:00,SP500,Health Care,Health Care Equipment & Services,BCR,Bard (C.R.) Inc.,74898000.0,101.28,-0.004423473901503994,7585669440.0\n20852,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Equipment & Services,SLB,Schlumberger Ltd.,1286793000.0,70.8,-0.01324041811846699,91104944400.0\n2964,2013-01-14 00:00:00,SP500,Health Care,Health Care Equipment & Services,BDX,Becton Dickinson,191835000.0,79.49,0.006584779030011312,15248964149.999998\n20800,2013-01-14 00:00:00,SP500,Consumer Staples,Packaged Foods & Meats,SJM,Smucker (J.M.),101817000.0,84.88,0.0009433962264151496,8642226960.0\n20748,2013-01-14 00:00:00,SP500,Materials,Diversified Chemicals,SIAL,Sigma-Aldrich,119085000.0,75.15,0.0009323388385722442,8949237750.0\n3016,2013-01-14 00:00:00,SP500,Financials,Diversified Financial Services,BEN,Franklin Resources,622900000.0,44.54,-0.0006730984967466824,27743966000.0\n20696,2013-01-14 00:00:00,SP500,Materials,Specialty Chemicals,SHW,Sherwin-Williams,95997000.0,158.08,-0.0006321911746112185,15175205760.000002\n20644,2013-01-14 00:00:00,SP500,Materials,Paper Packaging,SEE,Sealed Air Corp.(New),210399000.0,17.77,0.006228765571913986,3738790230.0\n3068,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Equipment & Services,BHI,Baker Hughes Inc,432598000.0,41.06,-0.023078753271472685,17762473880.0\n20592,2013-01-14 00:00:00,SP500,Energy,Oil & Gas Refining & Marketing & Transportation,SE,Spectra Energy Corp.,670893000.0,25.99,0.0034749034749035346,17436509070.0\n3120,2013-01-14 00:00:00,SP500,Health Care,Biotechnology,BIIB,BIOGEN IDEC Inc.,236155000.0,143.88,0.0006259127894847616,33977981400.0\n20540,2013-01-14 00:00:00,SP500,Financials,Diversified Financial Services,SCHW,Charles Schwab Corporation,1303355000.0,14.95,-0.0073041168658699585,19485157250.0\n20488,2013-01-14 00:00:00,SP500,Utilities,Multi-Utilities & Unregulated Power,SCG,SCANA Corp,142052000.0,43.04,-0.003934274473501587,6113918080.0\n3172,2013-01-14 00:00:00,SP500,Financials,Banks,BK,The Bank of New York Mellon Corp.,1125709000.0,25.71,-0.002328288707799664,28941978390.0\n20436,2013-01-14 00:00:00,SP500,Consumer Discretionary,Restaurants,SBUX,Starbucks Corp.,749500000.0,53.37,-0.006330292310556707,40000815000.0\n3224,2013-01-14 00:00:00,SP500,Financials,Diversified Financial Services,BLK,BlackRock,167610000.0,212.71,0.005340769448908267,35652323100.0\n'

好的,看看你嘗試了什么:

holdings['wt'] = holdings.groupby(['holdings.portfolio','holdings.date']).apply(lambda x: x['mv']/sum(x['mv']) )

這會失敗,因為您在此處分組時減少了行數,但嘗試分配回原始df並且索引不再兼容。

你應該做的是調用transform如果你想將一些groupby操作的結果分配回原來的df:

In [174]:

holdings['wt'] = holdings.groupby(['holdings.portfolio','holdings.date'])['mv'].transform(lambda x: x/sum(x))
holdings['wt']

Out[174]:
0        0.009482
20072    0.013306
3432     0.018552
20020    0.016702
19968    0.009593
3484     0.734775
19916    0.265225
3536     0.007512
19864    0.009043
19812    0.016728
3588     0.014317
19760    0.019309
19708    0.002536
3640     0.118602
19656    0.004879
3692     0.009170
19604    0.096593
19552    0.014837
3744     0.011526
19500    0.029810
19448    0.005771
3796     0.012989
19396    0.008991
3848     0.010580
19344    0.025407
19292    0.022285
20124    0.010584
3900     0.051298
3380     0.050057
3328     0.003093
21008    0.007574
2860     0.004514
20956    0.009502
20904    0.004181
2912     0.007046
20852    0.084619
2964     0.014163
20800    0.008027
20748    0.008312
3016     0.025769
20696    0.014095
20644    0.003473
3068     0.016498
20592    0.016195
3120     0.031559
20540    0.018098
20488    0.005679
3172     0.026881
20436    0.037153
3224     0.033114
Name: wt, dtype: float64

你做的另一件事有點奇怪:

holdings['wt'] = holdings['mv'].groupby([holdings['holdings.portfolio'],holdings['holdings.date']]).apply(lambda x: x/sum(x) )

您沒有傳遞列名,而是傳遞了2個系列的列表並在列'mv'(系列)上調用了此列,這不會創建任何分組,因為沒有要分組的列強制它返回帶索引的系列與您的原始df兼容。

我們可以測試我的transform方法與上一個方法相同:

In [178]:

holdings['wt'].equals(holdings['mv'].groupby([holdings['holdings.portfolio'],holdings['holdings.date']]).apply(lambda x: x/sum(x) ))
Out[178]:
True

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM