简体   繁体   English

使用 Pandas 合并和可视化数据集

[英]merging and visualization of datasets using pandas

For example, i have the following data frame in pandas cleaned and ready for merging.例如,我清理了熊猫中的以下数据框并准备好合并。

DataFrame1 for average Income per year

Country | Year 1  | Year 2  | Year 3
  A     |   50    |   30    |   20
  B     |   70    |   20    |   90
  C     |   10    |   20    |   30

Dataframe2 for Fertility rate 

Country | Year 1 | Year 2 | Year 3
   A    |   1.5  |   2    |  2.5
   B    |   2    |   2    |   3
   C    |   1    |   1    |   4 

Basically im trying to show the relationship between DataFrame1 and DataFrame2 over the years on matplotlib.基本上,我试图在 matplotlib 上展示多年来 DataFrame1 和 DataFrame2 之间的关系。 But i cant seem to merge them as they have the same headings as Years?但是我似乎无法合并它们,因为它们与 Years 具有相同的标题? In addition, i just cant seem to find a graph for me to compare these data on matplotlib when trying to use the X axis as the years.Any advice wold be great as im using the values above as the datasets are huge.此外,当我尝试使用 X 轴作为年份时,我似乎无法找到一个图表来比较 matplotlib 上的这些数据。任何建议都很棒,因为我使用上面的值,因为数据集很大。 Could it be that the data is too much?会不会是数据太多了?

Consider generating separate country plots with a secondary axis since you are tracking two metrics of different scales: Income and Fertility .由于您正在跟踪两个不同尺度的指标: IncomeFertility ,因此请考虑使用辅助轴生成单独的国家/地区图。 For this setup, you will need to reshape your wide format to long with pandas.melt() .对于此设置,您需要使用pandas.melt()将宽格式重塑为长格式。 Then, iterate through the distinct countries to filter the data frames.然后,遍历不同的国家以过滤数据框。

Data数据

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df1 = pd.DataFrame({'Country': ['A', 'B', 'C'],
                    'Year 1': [50, 70, 10],
                    'Year 2': [30, 20, 20],
                    'Year 3': [20, 90, 30]})

df1 = df1.melt(id_vars='Country', value_name='Income', var_name='Year')

df2 = pd.DataFrame({'Country': ['A', 'B', 'C'],
                    'Year 1': [1.5, 2, 1],
                    'Year 2': [2.0, 2, 1],
                    'Year 3': [2.5, 3, 4]})

df2 = df2.melt(id_vars='Country', value_name='Fertility', var_name='Year')

Plot阴谋

for c in df1['Country'].unique():
    fig, ax1 = plt.subplots(figsize=(10,4))

    ax2 = ax1.twinx()
    df1[df1['Country']==c].plot(kind='line', x='Year', y='Income', ax=ax1, color='g', legend=False)
    df2[df2['Country']==c].plot(kind='line', x='Year', y='Fertility', ax=ax2, color='b', legend=False)

    plt.title('Country ' + c)
    ax1.set_xlabel('Years')
    ax1.set_ylabel('Average Income Per Year')
    ax2.set_ylabel('Fertility Rate')

    lines = ax1.get_lines() + ax2.get_lines()
    ax1.legend(lines, [l.get_label() for l in lines], loc='upper left')

    ax1.set_xticks(np.arange(3))
    ax1.set_xticklabels(df1["Year"].unique())

    plt.show()
    plt.clf()

plt.close()

绘图输出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM