簡體   English   中英

Python:df.mean似乎給出了錯誤的輸出,為什么?

[英]Python: df.mean seems to give the wrong output, why?

背景:我正在忙於分析各種實驗數據。 目的是導入帶有各種工作表的excel文件。 然后從數據中“濾除”噪聲並找到所有樣本的平均值。 然后繪制圖形並保存圖形。

進度與問題:我已經能夠完成上述所有步驟,但是,最終圖表中包含各種樣本與平均值的關系對我來說似乎是錯誤的。 我不確定“ df.mean”是否是找到平均值的正確方法。 我附上了我得到的圖表,以某種方式我不能同意平均值可以這么低嗎? 可以看出,代碼中保存的圖像切斷了圖例,我該如何更改?

需要改進:這是我關於stackoverflow的第一個問題,我還是Python的新手。 該代碼似乎非常“蓬松”,對於縮短代碼,我將不勝感激。

我的密碼:

#IMPORT LIBRARIES
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#IMPORT DATA 
excel_df= pd.ExcelFile('data.xlsx',delimiter = ';') #import entire excel file
sheet1=pd.read_excel('data.xlsx',sheetname=0,names=['time','void1','pressure1'])
sheet2=pd.read_excel('data.xlsx',sheetname=1,names=['time','void2','pressure2'])
sheet3=pd.read_excel('data.xlsx',sheetname=2,names=['time','void3','pressure3']) 
sheet4=pd.read_excel('data.xlsx',sheetname=3,names=['time','void4','pressure4'])
sheet5=pd.read_excel('data.xlsx',sheetname=4,names=['time','void5','pressure5'])
sheet6=pd.read_excel('data.xlsx',sheetname=5,names=['time','void6','pressure6'])
sheet7=pd.read_excel('data.xlsx',sheetname=6,names=['time','void7','pressure7'])
sheet8=pd.read_excel('data.xlsx',sheetname=7,names=['time','void8','pressure8'])
sheet10=pd.read_excel('data.xlsx',sheetname=9,names=['time','void10','pressure10'])

#SORT VALUES TO FIND THE UNWANTED DATA
sheet1.sort_values('pressure1',ascending=False).head() #the pressure has noise so sort accordingly

#GET ONLY WANTED DATA WITHOUT NOISE
sheet1_new = sheet1[sheet1.pressure1 <=8] #exclude the noise above 8 bar
sheet2_new = sheet2[sheet2.pressure2 <=8] #exclude the noise above 8 bar
sheet3_new= sheet3[sheet3.pressure3 <=8] #exclude the noise above 8 bar
sheet4_new = sheet4[sheet4.pressure4 <=8] #exclude the noise above 8 bar
sheet5_new = sheet5[sheet5.pressure5 <=8] #exclude the noise above 8 bar
sheet6_new = sheet6[sheet6.pressure6 <=8] #exclude the noise above 8 bar
sheet7_new = sheet7[sheet7.pressure7 <=8] #exclude the noise above 8 bar
sheet8_new = sheet8[sheet8.pressure8 <=8] #exclude the noise above 8 bar
sheet10_new = sheet10[sheet10.pressure10 <=8] #exclude the noise above 8 bar

#MERGE THE DATASETS TO FIND AVERAGE OF ALL SAMPLES

#'MERGE' ONLY MERGES 2 DATAFRAMES AT A TIME
merge12_df = pd.merge(sheet1_new,sheet2_new, on='time')
merge34_df = pd.merge(sheet3_new,sheet4_new, on='time')
merge56_df = pd.merge(sheet5_new,sheet6_new, on='time')
merge78_df = pd.merge(sheet7_new,sheet8_new, on='time')

#MERGE ON FIRST OUTPUT
all_merged = merge12_df.merge(merge34_df, on='time').merge(merge56_df, on = 'time').merge(merge78_df, on = 'time').merge(sheet10_new, on = 'time')
#print(all_merged.head()) #check that all data is merged into one dataframe

#AVERAGE ALL PRESSURES
mean_all_pressures = all_merged[["pressure1", "pressure2","pressure3", "pressure4","pressure5", "pressure6","pressure7", "pressure8", "pressure10"]].mean(axis=1)

#PRINT AVERAGE VS ALL THE SAMPLES GRAPH 
plt.figure(1) 
plt.plot(all_merged.time,mean_all_pressures,'r.') #plot the average of all samples.
plt.plot(sheet1_new.time,sheet1_new.pressure1)
plt.plot(sheet2_new.time,sheet2_new.pressure2)
plt.plot(sheet3_new.time,sheet3_new.pressure3)
plt.plot(sheet4_new.time,sheet4_new.pressure4)
plt.plot(sheet5_new.time,sheet5_new.pressure5)
plt.plot(sheet6_new.time,sheet6_new.pressure6)
plt.plot(sheet7_new.time,sheet7_new.pressure7)
plt.plot(sheet8_new.time,sheet8_new.pressure8)
plt.plot(sheet10_new.time,sheet10_new.pressure10)
plt.legend(['Average','Sample 1','Sample 2','Sample 3','Sample 4','Sample 5','Sample 6','Sample 7','Sample 8','Sample 10'],bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.xlabel('Time (s)'),plt.ylabel('Pressure (bar)') #Specify the plot details
plt.savefig('AllPressures_vs_Average.png') #Save the plot for later use
plt.show() #Display the plot

代碼中的大部分重復操作是由於您為每個工作表定義了一個單獨的變量,然后對它們進行相同的操作。

您可以通過將每張工作表的內容存儲到單個詞典中而不是將單獨的變量存儲中來改進當前代碼。

文檔中 ,您可以看到通過指定s heetname = None ,可以將所有工作表導入為字典。 或者,您也可以提供要閱讀的圖紙列表,在您的情況下為[0,1,2,...,11]因為它們的索引為0。

sheets_dict = pd.read_excel('data.xlsx',sheetname=None,names=['time','void1','pressure1'])

您可以快速查看所使用的內容:

for name, sheet in sheets_dict.iteritems():
    print name, sheet.head()

您可以在需要時分別訪問每個工作表:

sheets_dict['sheet_1_name']

這樣可以避免很多重復。 例如,過濾將僅僅是:

new_sheets_dict = {key: el[el.pressure1 <=8] for key, el in sheets_dict.iteritems)}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM