使用 Pandas 合并和可视化数据集

Question

For example, i have the following data frame in pandas cleaned and ready for merging.例如，我清理了熊猫中的以下数据框并准备好合并。

DataFrame1 for average Income per year

Country | Year 1  | Year 2  | Year 3
  A     |   50    |   30    |   20
  B     |   70    |   20    |   90
  C     |   10    |   20    |   30

Dataframe2 for Fertility rate 

Country | Year 1 | Year 2 | Year 3
   A    |   1.5  |   2    |  2.5
   B    |   2    |   2    |   3
   C    |   1    |   1    |   4

Basically im trying to show the relationship between DataFrame1 and DataFrame2 over the years on matplotlib.基本上，我试图在 matplotlib 上展示多年来 DataFrame1 和 DataFrame2 之间的关系。 But i cant seem to merge them as they have the same headings as Years?但是我似乎无法合并它们，因为它们与 Years 具有相同的标题？ In addition, i just cant seem to find a graph for me to compare these data on matplotlib when trying to use the X axis as the years.Any advice wold be great as im using the values above as the datasets are huge.此外，当我尝试使用 X 轴作为年份时，我似乎无法找到一个图表来比较 matplotlib 上的这些数据。任何建议都很棒，因为我使用上面的值，因为数据集很大。 Could it be that the data is too much?会不会是数据太多了？

Answer 1

Consider generating separate country plots with a secondary axis since you are tracking two metrics of different scales: Income and Fertility .由于您正在跟踪两个不同尺度的指标： Income和Fertility ，因此请考虑使用辅助轴生成单独的国家/地区图。 For this setup, you will need to reshape your wide format to long with pandas.melt() .对于此设置，您需要使用pandas.melt()将宽格式重塑为长格式。 Then, iterate through the distinct countries to filter the data frames.然后，遍历不同的国家以过滤数据框。

Data数据

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df1 = pd.DataFrame({'Country': ['A', 'B', 'C'],
                    'Year 1': [50, 70, 10],
                    'Year 2': [30, 20, 20],
                    'Year 3': [20, 90, 30]})

df1 = df1.melt(id_vars='Country', value_name='Income', var_name='Year')

df2 = pd.DataFrame({'Country': ['A', 'B', 'C'],
                    'Year 1': [1.5, 2, 1],
                    'Year 2': [2.0, 2, 1],
                    'Year 3': [2.5, 3, 4]})

df2 = df2.melt(id_vars='Country', value_name='Fertility', var_name='Year')

Plot阴谋

for c in df1['Country'].unique():
    fig, ax1 = plt.subplots(figsize=(10,4))

    ax2 = ax1.twinx()
    df1[df1['Country']==c].plot(kind='line', x='Year', y='Income', ax=ax1, color='g', legend=False)
    df2[df2['Country']==c].plot(kind='line', x='Year', y='Fertility', ax=ax2, color='b', legend=False)

    plt.title('Country ' + c)
    ax1.set_xlabel('Years')
    ax1.set_ylabel('Average Income Per Year')
    ax2.set_ylabel('Fertility Rate')

    lines = ax1.get_lines() + ax2.get_lines()
    ax1.legend(lines, [l.get_label() for l in lines], loc='upper left')

    ax1.set_xticks(np.arange(3))
    ax1.set_xticklabels(df1["Year"].unique())

    plt.show()
    plt.clf()

plt.close()

使用 Pandas 合并和可视化数据集

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-09-09 17:51:59

使用 Pandas 合并和可视化数据集

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-09-09 17:51:59

解决方案1
0 已采纳 2018-09-09 17:51:59