I am using Python 3.5. Also, I am a beginner (3 weeks experience) Python attempter and somehow I haven't given up in trying to analyze my data.
Data Description : My data is in a csv file (fev.csv). I've included it here if you want to see the full extent of it full data set . It has 5 columns:
Task : I am trying to write a program to generate a bar graph of average FEVs with error bars indicating standard deviation. I'm trying to get 2 side by side bars (smokers/non-smokers) at 4 different age categories (11-12, 13-14, 15-16, 17 or older).
Code so far (please excuse all my #notes, it helps me know what I'm trying to do):
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('fev.csv')
nonsmokers = data[data.smoke==0]
smokers = data[data.smoke==1]
nonsmokers1 = nonsmokers[(nonsmokers.age==11) | (nonsmokers.age==12)]
nonsmokers2 = nonsmokers[(nonsmokers.age==13) | (nonsmokers.age==14)]
nonsmokers3 = nonsmokers[(nonsmokers.age==15) | (nonsmokers.age==16)]
nonsmokers4 = nonsmokers[(nonsmokers.age>=17)]
smokers1 = smokers[(smokers.age==11) | (smokers.age==12)]
smokers2 = smokers[(smokers.age==13) | (smokers.age==14)]
smokers3 = smokers[(smokers.age==15) | (smokers.age==16)]
smokers4 = smokers[(smokers.age>=17)]
nonsmMean = [nonsmokers1.fev.mean(), nonsmokers2.fev.mean(), nonsmokers3.fev.mean(), nonsmokers4.fev.mean()]
nonsmSd = [nonsmokers1.fev.std(), nonsmokers2.fev.std(), nonsmokers3.fev.std(), nonsmokers4.fev.std()]
smMean = [smokers1.fev.mean(), smokers2.fev.mean(), smokers3.fev.mean(), smokers4.fev.mean()]
smSd = [smokers1.fev.std(), smokers2.fev.std(), smokers3.fev.std(), smokers4.fev.std()]
# data to be plotted
nonsmoker = np.array(nonsmMean)
sdNonsmoker = np.array(nonsmSd)
smoker = np.array(smMean)
sdSmoker = np.array(smSd)
# parameters
bar_width = 0.35
x = np.arange(len(nonsmoker))
# plotting bars
plt.bar(x, nonsmoker, bar_width, yerr=sdNonsmoker, ecolor='k', color='b', label='Nonsmokers')
plt.bar(x+bar_width, smoker, bar_width, yerr=sdSmoker, ecolor='k', color='m', label='Smokers')
# formatting and labeling the axes and title
plt.xlabel('Age')
plt.ylabel('FEV')
plt.title('Mean FEV by Age and Smoking Status')
plt.xticks(x+0.35, ['11 to 12', '13 to 14', '15 to 16', '17+'])
# adding the legend
plt.legend()
plt.axis([-0.5,4.2,0,7])
plt.savefig('FEVgraph.png', dpi=300)
# and we are done!
plt.show()
Is there a more efficient way of doing this?
Thanks!
Possible solution is the following:
# pip install pandas
# pip install matplotlib
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# read csv file and create pandas dataframe
df = pd.read_csv('https://raw.githubusercontent.com/benkeser/halplus/master/inst/extdata/fev.csv')
# assign age bins to data
bins = [df['age'].min()-1, 10, 12, 14, 16, df['age'].max()]
bins_labels = ['<11', '11 to 12', '13 to 14', '15 to 16', '17+']
df['age_bins'] = pd.cut(df['age'], bins, labels = bins_labels)
# aggregate data
result = df.groupby(['smoke', 'age_bins'], as_index=False).agg({'fev':['mean','std']})
result.columns = ['_'.join(col).strip('_') for col in result.columns.values]
result = result.round(1)
# prepare data for plot
nonsmokers = result[result['smoke'] == 0]
smokers = result[result['smoke'] == 1]
x = np.arange(len(bins_labels))
width = 0.35
# set plot fugure size
plt.rcParams["figure.figsize"] = [8,6]
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, nonsmokers['fev_mean'], width, yerr=nonsmokers['fev_std'], color='b', label='Nonsmokers')
rects2 = ax.bar(x + width/2, smokers['fev_mean'], width, yerr=smokers['fev_std'], color='m', label='Smokers')
ax.set_xlabel('Age')
ax.set_ylabel('FEV')
ax.set_title('Mean FEV by Age and Smoking Status')
ax.set_xticks(x, bins_labels)
ax.legend(loc=2)
fig.tight_layout()
plt.savefig('FEVgraph.png', dpi=300)
plt.show()
Returns
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.