简体   繁体   中英

Graph with multiple boxplots using Python

I am currently coding using Python in Google Collab. I am working with underwater glider data that I have uploaded via url from NOAA's ERDDAP site.

url = 'https://gliders.ioos.us/erddap/tabledap/ru28-20150917T1300.csv?profile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Ctemperature%2Csalinity%2Cdensity&time%3E=2015-09-18T00%3A00%3A00Z&time%3C=2015-10-06T00%3A00%3A00Z'

url2 = 'https://gliders.ioos.us/erddap/tabledap/ru28-20140815T1405.csv?profile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Ctemperature%2Csalinity%2Cdensity&time%3E=2014-08-16T00%3A00%3A00Z&time%3C=2014-09-04T00%3A00%3A00Z'

url3 = 'https://gliders.ioos.us/erddap/tabledap/ru28-20130813T1436.csv?profile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Ctemperature%2Csalinity%2Cdensity&time%3E=2013-08-14T00%3A00%3A00Z&time%3C=2013-08-26T00%3A00%3A00Z'

url4 = 'https://gliders.ioos.us/erddap/tabledap/blue-20200819T1433.csv?profile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Ctemperature%2Csalinity%2Cdensity&time%3E=2020-08-19T00%3A00%3A00Z&time%3C=2020-08-25T00%3A00%3A00Z'

url5 = 'https://gliders.ioos.us/erddap/tabledap/blue-20190815T1711.csv?profile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Ctemperature%2Csalinity%2Cdensity&time%3E=2019-08-16T00%3A00%3A00Z&time%3C=2019-09-24T00%3A00%3A00Z'

url6 = 'https://gliders.ioos.us/erddap/tabledap/blue-20180806T1400.csv?profile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Ctemperature%2Csalinity%2Cdensity&time%3E=2018-08-07T00%3A00%3A00Z&time%3C=2018-10-31T00%3A00%3A00Z'

url7 = 'https://gliders.ioos.us/erddap/tabledap/blue-20170831T1436.csv?profile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Ctemperature%2Csalinity%2Cdensity&time%3E=2017-09-01T00%3A00%3A00Z&time%3C=2017-09-24T00%3A00%3A00Z'

I then loaded the datasets:

data1 = pd.read_csv(url, skiprows=[1], parse_dates=['time'], index_col='time')
data2 = pd.read_csv(url2, skiprows=[1], parse_dates=['time'], index_col='time')
data3 = pd.read_csv(url3, skiprows=[1], parse_dates=['time'], index_col='time')
data4 = pd.read_csv(url4, skiprows=[1], parse_dates=['time'], index_col='time')
data5 = pd.read_csv(url5, skiprows=[1], parse_dates=['time'], index_col='time')
data6 = pd.read_csv(url6, skiprows=[1], parse_dates=['time'], index_col='time')
data7 = pd.read_csv(url7, skiprows=[1], parse_dates=['time'], index_col='time')

And combined them into one dataframe:

combined_df = pd.concat([data1, data2, data3, data4, data5, data6, data7], axis = 0)

Running the line combined_df.head() gives a preview of the data as such:


                       profile_id   latitude longitude depth temperature salinity   density
time                            
2015-09-18 00:02:41+00:00   81  40.350986   -73.871552  20.09   14.0286 32.678837   1024.4777
2015-09-18 00:02:41+00:00   81  40.350986   -73.871552  20.73   13.8871 32.658794   1024.4943
2015-09-18 00:02:41+00:00   81  40.350986   -73.871552  21.05   13.8069 32.680794   1024.5292
2015-09-18 00:04:36+00:00   82  40.350817   -73.871420  21.05   13.8069 32.680794   1024.5292
2015-09-18 00:16:07+00:00   83  40.349812   -73.870636  20.76   13.9284 32.670765   1024.4951

I need to make a graph with 7 individual boxplots with values from each dataset. I am focusing on temperature, salinity, and density. The x axis would be time. Any help would be greatly appreciated.

Since it seems that each file contains the data of one year, we can simplify the approach, andseaborn is of great help here. To make the code more readable (read: because we are too lazy to type repetitive things), we put these tasks into loops and store the necessary variables in lists.

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

url = 'https://gliders.ioos.us/erddap/tabledap/ru28-20150917T1300.csv?profile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Ctemperature%2Csalinity%2Cdensity&time%3E=2015-09-18T00%3A00%3A00Z&time%3C=2015-10-06T00%3A00%3A00Z'    
url2 = 'https://gliders.ioos.us/erddap/tabledap/ru28-20140815T1405.csv?profile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Ctemperature%2Csalinity%2Cdensity&time%3E=2014-08-16T00%3A00%3A00Z&time%3C=2014-09-04T00%3A00%3A00Z'    
url3 = 'https://gliders.ioos.us/erddap/tabledap/ru28-20130813T1436.csv?profile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Ctemperature%2Csalinity%2Cdensity&time%3E=2013-08-14T00%3A00%3A00Z&time%3C=2013-08-26T00%3A00%3A00Z'

urls = [url, url2, url3]  #<---add the remaining urls, this example is just for three of them
#because the download takes a while, we can simulate this with already downloaded files
#urls=["ru28-20140815T1405_0c34_1256_e732.csv", "ru28-20150917T1300_cc34_de4b_4c02.csv", "ru28-20130813T1436_5a0d_6ca1_4df0.csv"]

print("started loading")
#load file 1 into a dataframe and extract year as its identifier
combined_df = pd.read_csv(urls[0], skiprows=[1], parse_dates=['time'], index_col='time')
combined_df["year"] = combined_df.index.year
#we could also add another identifier in case years overlap between files
#combined_df["data_ID"] = 1
print("data file 1 is ready")

#load one url after the other and append it to the combined dataframe
for i, curr_url in enumerate(urls[1:]):
    tmp_data = pd.read_csv(curr_url, skiprows=[1], parse_dates=['time'], index_col='time')
    tmp_data["year"] = tmp_data.index.year
    #tmp_data["data_ID"] = i+2
    combined_df = pd.concat([combined_df, tmp_data], axis = 0)
    print(f"data file {i+2} is ready")

print("finished downloads")
print("plotting now")

fig, axes = plt.subplots(3, figsize=(8, 10))

sns.set_theme(style="ticks", palette="pastel")

categ = ["temperature", "density", "salinity"]    
cat_color = ["grey", "tab:orange", "yellow"]

for i, curr_ax in enumerate(axes.flat):
    sns.boxplot(x="year", y=categ[i], data=combined_df, color=cat_color[i], ax=curr_ax)
    sns.despine(offset=10, trim=True, ax=curr_ax)

plt.tight_layout(h_pad=2)
plt.show()

Sample output: 在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM