繁体   English   中英

python - 为什么在python中将多个帧与pandas合并时行会加倍

[英]Why rows get doubled when merging multiple frames with pandas in python

我正在尝试合并我用 excel 中的 pandas 读取的多个年终结果(2013 年到 2021 年) - 以这三年为例,使用以下代码生成:

HNRrawfile_2018 = HNR_source_path+'HNR_AR2018_XLS_en.xlsx'

df2018_assets = pd.read_excel(
    HNRrawfile_2018,
    sheet_name=assets,
    skiprows=4,
    skipfooter=1,
    usecols=[0, 2])
df2018_assets.columns = ['ITEM', '2018']
df2018_assets['2018'] = df2018_assets['2018'].astype(int)
df2018_assets['ITEM'] = df2018_assets['ITEM'].str.strip().str.lower()
print(df2018_assets)
                                                 ITEM      2019
0          fixed-income securities - held to maturity    223049
1     fixed-income securities - loans and receivables   2194064
2        fixed-income securities - available for sale  38068459
3   fixed-income securities - at fair value throug...    578779
4              equity securities - available for sale     29215
5   other financial assets - at fair value through...    235019
6                                 investment property   1749517
7                                   real estate funds    534739
8                 investments in associated companies    245478
9                               other invested assets   2211905
10                             short-term investments    468350
11                          cash and cash equivalents   1090852
12    total investments and cash under own management  47629426
13                                     funds withheld  10948469
14                                  contract deposits    325302
15                                  total investments  58903197
16          reinsurance recoverables on unpaid claims   2050114
17        reinsurance recoverables on benefit reserve    852598
18                        prepaid reinsurance premium    116176
19  reinsurance recoverables on other technical re...      9355
20                         deferred acquisition costs   2931722
21                                accounts receivable   5269792
22                                           goodwill     88303
23                                deferred tax assets    442469
24                                       other assets    640956
25                          accrued interest and rent     15414
26                               assets held for sale     36308
27                                       total assets  71356404

                                                 ITEM      2018
0          fixed-income securities – held to maturity    249943
1     fixed-income securities – loans and receivables   2398950
2        fixed-income securities – available for sale  33239685
3   fixed-income securities – at fair value throug...    559750
4              equity securities – available for sale     28729
5   other financial assets – at fair value through...    190759
6                                 investment property   1684932
7                                   real estate funds    433899
8                 investments in associated companies    110545
9                               other invested assets   1805281
10                             short-term investments    421950
11                          cash and cash equivalents   1072915
12    total investments and cash under own management  42197338
13                                     funds withheld  10691768
14                                  contract deposits    172873
15                                  total investments  53061979
16          reinsurance recoverables on unpaid claims   2084630
17        reinsurance recoverables on benefit reserve    909056
18                        prepaid reinsurance premium     93678
19  reinsurance recoverables on other technical re...      7170
20                         deferred acquisition costs   2155820
21                                accounts receivable   3975778
22                                           goodwill     85588
23                                deferred tax assets    454608
24                                       other assets    629420
25                          accrued interest and rent     11726
26                               assets held for sale   1039184
27                                       total assets  64508637

                                                 ITEM      2017
0          fixed-income securities – held to maturity    336182
1     fixed-income securities – loans and receivables   2455164
2        fixed-income securities – available for sale  31281908
3   fixed-income securities – at fair value throug...    212042
4              equity securities – available for sale     37520
5   other financial assets – at fair value through...     88832
6                   real estate and real estate funds   1968702
7                 investments in associated companies    121075
8                               other invested assets   1761678
9                              short-term investments    958669
10                          cash and cash equivalents    835706
11    total investments and cash under own management  40057478
12                                     funds withheld  10735012
13                                  contract deposits    167854
14                                  total investments  50960344
15          reinsurance recoverables on unpaid claims   1651335
16        reinsurance recoverables on benefit reserve    959533
17                        prepaid reinsurance premium     96402
18  reinsurance recoverables on other technical re...      7301
19                         deferred acquisition costs   2228246
20                                accounts receivable   3821124
21                                           goodwill     91692
22                                deferred tax assets    466564
23                                       other assets    904253
24                          accrued interest and rent     10052
25                               assets held for sale         0
26                                       total assets  61196846

但是,有些行会翻倍,尤其是固定收益行,它们在 2021 年至 2019 年期间出现在顶部,但从 2018 年开始在底部增加一倍:

                                                 ITEM        2021        2020        2019        2018        2017        2016        2015        2014        2013
0          fixed-income securities - held to maturity     48632.0    185577.0    223049.0         NaN         NaN         NaN         NaN         NaN         NaN
1     fixed-income securities - loans and receivables   2443629.0   2532146.0   2194064.0         NaN         NaN         NaN         NaN         NaN         NaN
2        fixed-income securities - available for sale  45473677.0  38851723.0  38068459.0         NaN         NaN         NaN         NaN         NaN         NaN
3   fixed-income securities - at fair value throug...     81308.0    105711.0    578779.0         NaN         NaN         NaN         NaN         NaN         NaN
4              equity securities - available for sale    314453.0    378422.0     29215.0         NaN         NaN         NaN         NaN         NaN         NaN
5   other financial assets - at fair value through...    248233.0    234689.0    235019.0         NaN         NaN         NaN         NaN         NaN         NaN
6                                 investment property   1818754.0   1589238.0   1749517.0   1684932.0         NaN         NaN         NaN         NaN         NaN
7                                   real estate funds    805912.0    582296.0    534739.0    433899.0         NaN         NaN         NaN         NaN         NaN
8                 investments in associated companies    238110.0    361617.0    245478.0    110545.0    121075.0    114633.0    128008.0    154822.0    144489.0
9                               other invested assets   2941633.0   2794016.0   2211905.0   1805281.0   1761678.0   1764678.0   1544533.0   1316604.0   1023214.0
10                             short-term investments    443793.0    327426.0    468350.0    421950.0    958669.0    838987.0   1113130.0    575300.0    549138.0
11                          cash and cash equivalents   1355114.0   1278071.0   1090852.0   1072915.0    835706.0    848667.0    792604.0    772882.0    642936.0
12    total investments and cash under own management  56213248.0  49220932.0  47629426.0  42197338.0  40057478.0  41793495.0  39346903.0  36228010.0  31875242.0
13                                     funds withheld  10803071.0   9659807.0  10948469.0  10691768.0  10735012.0  11673259.0  13801845.0  15826480.0  14267831.0
14                                  contract deposits    503412.0    298344.0    325302.0    172873.0    167854.0    170505.0    188604.0     92069.0     75541.0
15                                  total investments  67519731.0  59179083.0  58903197.0  53061979.0  50960344.0  53637259.0  53337352.0  52146559.0  46218614.0
16          reinsurance recoverables on unpaid claims   2674107.0   1883270.0   2050114.0   2084630.0   1651335.0   1506292.0   1395281.0   1376432.0   1403804.0
17        reinsurance recoverables on benefit reserve    192039.0    192135.0    852598.0    909056.0    959533.0   1189420.0   1367173.0    676219.0    344154.0
18                        prepaid reinsurance premium    204597.0    165916.0    116176.0     93678.0     96402.0    134927.0    164023.0    149257.0    139039.0
19  reinsurance recoverables on other technical re...      2703.0      1106.0      9355.0      7170.0      7301.0     12231.0      8687.0      5446.0      6893.0
20                         deferred acquisition costs   3350633.0   2857071.0   2931722.0   2155820.0   2228246.0   2198089.0   2094671.0   1914598.0   1672398.0
21                                accounts receivable   7207750.0   5605803.0   5269792.0   3975778.0   3821124.0   3678030.0   3665937.0   3113978.0   2945685.0
22                                           goodwill     83933.0     80965.0     88303.0     85588.0     91692.0     64609.0     60244.0     58220.0     57070.0
23                                deferred tax assets    676344.0    597986.0    442469.0    454608.0    466564.0    408292.0    433500.0    393923.0    508841.0
24                                       other assets    972167.0    858170.0    640956.0    629420.0    904253.0    674389.0    680543.0    618280.0    603627.0
25                          accrued interest and rent     18248.0     18264.0     15414.0     11726.0     10052.0      9978.0      7527.0      4672.0      4193.0
26                                       total assets  82902252.0  71439769.0  71356404.0  64508637.0  61196846.0  63528602.0  63214938.0  60457584.0  53915544.0
27                               assets held for sale         NaN         0.0     36308.0   1039184.0         0.0     15086.0         NaN         0.0     11226.0
28         fixed-income securities – held to maturity         NaN         NaN         NaN    249943.0    336182.0    484955.0   1007665.0   2139742.0   2666787.0
29    fixed-income securities – loans and receivables         NaN         NaN         NaN   2398950.0   2455164.0   2563594.0   2869865.0   2988187.0   3209100.0
30       fixed-income securities – available for sale         NaN         NaN         NaN  33239685.0  31281908.0  32182173.0  29616448.0  26817523.0  22409892.0
31  fixed-income securities – at fair value throug...         NaN         NaN         NaN    559750.0    212042.0    239917.0    108982.0     64494.0     36061.0
32             equity securities – available for sale         NaN         NaN         NaN     28729.0     37520.0    905307.0    452108.0     32804.0     28980.0
33  other financial assets – at fair value through...         NaN         NaN         NaN    190759.0     88832.0     57665.0     39602.0     66394.0     70082.0
34                  real estate and real estate funds         NaN         NaN         NaN         NaN   1968702.0   1792919.0   1673958.0   1299258.0   1094563.0

合并是通过使用外部进行的,因为有些职位每年都不会出现,我想确保从左表和右表中收集每一行:

df_assets_1 = df2021_assets.merge(
    df2020_assets, how='outer', on='ITEM').merge(
    df2019_assets, how='outer', on='ITEM').merge(
    df2018_assets, how='outer', on='ITEM').merge(
    df2017_assets, how='outer', on='ITEM').merge(
    df2016_assets, how='outer', on='ITEM').merge(
    df2015_assets, how='outer', on='ITEM').merge(
    df2014_assets, how='outer', on='ITEM').merge(
    df2013_assets, how='outer', on='ITEM')

我试图通过

  • 确保项目描述被剥离并且全部小写
  • 在axis = 1上使用.concat而不是.merge
  • 使用左连接,但这会从右表中删除位置(如预期的那样)
  • 在没有 on='ITEM' 的情况下尝试它,但这没有任何区别

非常感谢您提供任何帮助 - 非常感谢您宝贵的时间!

我在 PowerQuery 中使用 M 检查了加入的过程,几乎每个步骤之后都需要清理数据,因为由于财务结果的性质(可能有一个总体的“标题行”(股东)例如价值)和另一行具有相同名称的持有股东价值的总和)。

因此,在每次外部合并之后,必须在进行下一次外部合并之前删除可能的重复项。 这导致了重复。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM