简体   繁体   中英

matplotlib / pyplot: plot average curve for multiple graphs

I'm a beginner in matplotlib. I've already created some graphs from a dataframe. Now I want to create the average graph from multiple graphs.

Can anyone help me with this?

My graphs look like this:

alb = df[df.location == 'Albania']
bel = df[df.location == 'Belarus']
bih = df[df.location == 'Bosnia and Herzegovina']
blg = df[df.location == 'Bulgaria']
cro = df[df.location == 'Croatia']
hun = df[df.location == 'Hungary']
kos = df[df.location == 'Kosovo']
mld = df[df.location == 'Moldova']
mng = df[df.location == 'Montenegro']
nmc = df[df.location == 'North Macedonia']
pol = df[df.location == 'Poland']
rom = df[df.location == 'Romania']
rus = df[df.location == 'Russia']
srb = df[df.location == 'Serbia']
tur = df[df.location == 'Turkey']
ukr = df[df.location == 'Ukraine']

plt.plot(alb.date, alb.new_deaths_per_million)
plt.plot(bel.date, bel.new_deaths_per_million)
plt.plot(bih.date, bih.new_deaths_per_million)
plt.plot(blg.date, blg.new_deaths_per_million)
plt.plot(cro.date, cro.new_deaths_per_million)
plt.plot(hun.date, hun.new_deaths_per_million)
plt.plot(kos.date, kos.new_deaths_per_million)
plt.plot(mld.date, mld.new_deaths_per_million)
plt.plot(mng.date, mng.new_deaths_per_million)
plt.plot(nmc.date, nmc.new_deaths_per_million)
plt.plot(pol.date, pol.new_deaths_per_million)
plt.plot(rom.date, rom.new_deaths_per_million)
plt.plot(rus.date, rus.new_deaths_per_million)
plt.plot(srb.date, srb.new_deaths_per_million)
plt.plot(tur.date, tur.new_deaths_per_million)
plt.plot(ukr.date, ukr.new_deaths_per_million)
plt.legend(['Albania', 'Belarus', 'Bosnia and Herzegovina', 'Bulgaria', 'Croatia', 'Hungary', 'Kosovo', 'Moldova', 'Montenegro', 'North Macedonia', 'Poland', 'Romania', 'Russia', 'Serbia', 'Turkey', 'Ukraine'])
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 200
plt.show()

The graphs look like this:

在此处输入图像描述

This is the dataset I use:

url = 'https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv'
df = pd.read_csv(url,error_bad_lines=False)
pd.options.display.max_columns = None
display(df)

Now I would like to create the average curve from all these graphs. Can someone help me with the code; how to do this in the easiest way?

Thanks in advance for the help!

Easiest would be to take the mean of the cases in all countries and plot this. You can do this by creating a new DataFrame :

frames = [alb, bel, bih, blg, cro, hun, kos, mld, mng, nmc, pol, rom, rus, srb, tur, ukr]
result = pd.concat(frames)[["date", "new_deaths_per_million"]] # Only select the relevant columns for now

result = result.groupby(["date"], as_index=False).mean() # Take the mean of all the values on the same date
print(result)

Output:

date        new_deaths_per_million                  
2020-01-31                     NaN
2020-02-01                     NaN
2020-02-02                     NaN
2020-02-03                     NaN
2020-02-04                     NaN
...                            ...
2021-12-15                7.698063
2021-12-16                6.557625
2021-12-17                7.023750
2021-12-18                4.249562
2021-12-19                3.559250

You can then plot this as well - as you have done with all the countries. Please note that you seem to have some missing data, resulting in NaN (not a number) values. You might want to clean the data a bit first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM