繁体   English   中英

Pandas 合并返回一个空的 dataframe

[英]Pandas merge returning an empty dataframe

I believe there is an issue with my merge function because when I try to add the three datasets together, I return an empty dataframe with the variables 2016_visitation 2017_visitation 2018_visitation 2019_visitation at the top of the header. 我有一个断言错误告诉我它找不到列“状态”所以我想知道它是从原始未编辑数据中提取信息还是我需要包含另一个导入 function? 也许另一列重命名? 我的合并 function 总体上可以使用一些调整,但我不确定哪种方法效果最好。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime

def load_data():

    # importing datasets
    df_2017=pd.read_excel('assets/US_States_Visited_2017.xlsx', skiprows=6,skipfooter=13)
    df_2018=pd.read_excel('assets/US_States_Visited_2018.xlsx', skiprows=7,skipfooter=7)
    df_2019=pd.read_excel('assets/US_States_Visited_2019.xlsx', skiprows=6,skipfooter=8)
    
    # renaming columns
    df_2017.columns = ['2017_rank','state','2016_market_share','2016_visitation','2017_market_share','2017_visitation','volume_change']
    df_2018.columns = ['2018_rank','state','2018_market_share','2018_visitation','volume_change','2017_market_share','2017_visitation']
    df_2019.columns = ['2019_rank','state','2019_market_share','2019_visitation','volume_change','2018_market_share','2018_visitation']
    
    # dropping all columns except for relevent state and visitation columns
    df_2017.drop(df_2017.columns[[0,2,4,6]], axis=1,inplace=True)
    df_2018.drop(df_2018.columns[[0,2,4,5,6]], axis=1,inplace=True)
    df_2019.drop(df_2019.columns[[0,2,4,5,6]], axis=1,inplace=True) 
    
    # multiplying visitation by 1000 to get accurate value
    df_2017['2016_visitation'] = df_2017['2016_visitation']*1000
    df_2017['2017_visitation'] = df_2017['2017_visitation']*1000
    df_2018['2018_visitation'] = df_2018['2018_visitation']*1000
    df_2019['2019_visitation'] = df_2019['2019_visitation']*1000
    
    # starting output at state column
    df_2017=df_2017.set_index('state')
    df_2018=df_2018.set_index('state')
    df_2019=df_2019.set_index('state')
    
    # merging all datasets by state variable
    merge = pd.merge(df_2017,df_2018,on="state")
    merged_US_states_visitation = pd.merge(merge,df_2019,on='state')
    
    # sorting alphabetically
    merged_US_states_visitation.sort_values(by=['state'])
    
    return merged_US_states_visitation

load_data().head(25)

问题是“状态”列与变量中的值不匹配,因此我包含了 df_201x['state'] = df_201x['state'].str.strip() ,它解决了格式问题并合并了数据。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM