简体   繁体   中英

How to store excel information into python fast and efficiently

I have an excel file that has over 5000 rows and 32 columns. The data represents 16 tests. It is stored like this: The first column is the x values that belong to the y values that are stored in the second column. The third column is the x values that belong to the y values that are stored in the fourth column, and so on... Since the values can be subdivided into 4 tests (4 tests of 8 columns is 32 columns) I would like to store them as follows:

  • column 1 = xtest11
  • column 2 = ytest11
  • column 3 = xtest12
  • column 4 = ytest12
  • column 5 = xtest13
  • column 6 = ytest13
  • column 7 = xtest14
  • column 8 = ytest14
  • column 9 = xtest21
  • column 10 = ytest22
  • column 11 = xtest23
  • column 12 = ytest24
  • ...
  • column 31 = xtest44
  • column 32 = ytest44

Right now I have this all written out by using many lines of code. I would like to know if it is possible to use a for loop or something else to make this code look better. I am new to python (spyder) and I would like to learn this things to make it easier for me along the way. Thank you very much.

F0=-33.5
df = pd.read_excel (r'D:\GEO_ENGINEERING\Thesis\Resultaten instrong\datazuiver.xlsx') #place "r" before the path string to address special character, such as '\'. Don't forget to put the file name at the end of the path + '.xlsx'
    
    

#%% Test 1

T11=df["Time_1_1"]
F11=df["F_1_1"]-F0
T12=df["Time_1_2"]
F12=df["F_1_2"]-F0
T13=df["Time_1_3"]
F13=df["F_1_3"]-F0
T14=df["Time_1_4"]
F14=df["F_1_4"]-F0
T11=T11[1:5000]
T12=T12[1:5000]
T13=T13[1:5000]
T14=T14[1:5000]
F11=F11[1:5000]
F12=F12[1:5000]
F13=F13[1:5000]
F14=F14[1:5000]
plt.figure()
plt.plot(T11,F11, label='Proef 1 test 1')
plt.plot(T12,F12, label='Proef 1 test 2')
plt.plot(T13,F13, label='Proef 1 test 3')
plt.plot(T14,F14, label='Proef 1 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 1')
plt.show()

#%%
df = pd.read_excel (r'D:\GEO_ENGINEERING\Thesis\Resultaten instrong\datazuiver.xlsx') #place "r" before the path string to address special character, such as '\'. Don't forget to put the file name at the end of the path + '.xlsx'
T21=df["Time_2_1"]
F21=df["F_2_1"]-F0
T22=df["Time_2_2"]
F22=df["F_2_2"]-F0
T23=df["Time_2_3"]
F23=df["F_2_3"]-F0
T24=df["Time_2_4"]
F24=df["F_2_4"]-F0
T21=T21[1:2366]
T22=T22[1:2366]
T23=T23[1:2366]
T24=T24[1:2366]
F21=F21[1:2366]
F22=F22[1:2366]
F23=F23[1:2366]
F24=F24[1:2366]
plt.figure()
plt.plot(T21,F21, label='Proef 2 test 1')
plt.plot(T22,F22, label='Proef 2 test 2')
plt.plot(T23,F23, label='Proef 2 test 3')
plt.plot(T24,F24, label='Proef 2 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 2')
plt.show()
#%% Proef 3


T31=df["Time_3_1"]
F31=df["F_3_1"]-F0
T32=df["Time_3_2"]
F32=df["F_3_2"]-F0
T33=df["Time_3_3"]
F33=df["F_3_3"]-F0
T34=df["Time_3_4"]
F34=df["F_3_4"]-F0
T31=T31[1:2366]
T32=T32[1:2344]
T33=T33[1:2366]
T34=T34[1:2366]
F31=F31[1:2366]
F32=F32[1:2344]
F33=F33[1:2366]
F34=F34[1:2366]
plt.figure()
plt.plot(T31,F31, label='Proef 3 test 1')
plt.plot(T32,F32, label='Proef 3 test 2')
plt.plot(T33,F33, label='Proef 3 test 3')
plt.plot(T34,F34, label='Proef 3 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 3 (no drillgrout)')
plt.legend()
plt.show()

#%% Proef 4

T41=df["Time_4_1"]
F41=df["F_4_1"]-F0
T42=df["Time_4_2"]
F42=df["F_4_2"]-F0
T43=df["Time_4_3"]
F43=df["F_4_3"]-F0
T44=df["Time_4_4"]
F44=df["F_4_4"]-F0
T41=T41[1:2366]
T42=T42[1:2366]
T43=T43[1:2366]
T44=T44[1:2366]
F41=F41[1:2366]
F42=F42[1:2366]
F43=F43[1:2366]
F44=F44[1:2366]
plt.figure()
plt.plot(T41,F41, label='Proef 4 test 1')
plt.plot(T42,F42, label='Proef 4 test 2')
plt.plot(T43,F43, label='Proef 4 test 3')
plt.plot(T44,F44, label='Proef 4 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 4')
plt.legend()
plt.show()
#%% Plot all the tests
plt.figure()
plt.plot(T11,F11, label='Proef 1 test 1')
plt.plot(T12,F12, label='Proef 1 test 2')
plt.plot(T13,F13, label='Proef 1 test 3')
plt.plot(T14,F14, label='Proef 1 test 4')
plt.plot(T21,F21, label='Proef 2 test 1')
plt.plot(T22,F22, label='Proef 2 test 2')
plt.plot(T23,F23, label='Proef 2 test 3')
plt.plot(T24,F24, label='Proef 2 test 4')
plt.plot(T41,F41, label='Proef 4 test 1')
plt.plot(T42,F42, label='Proef 4 test 2')
plt.plot(T43,F43, label='Proef 4 test 3')
plt.plot(T44,F44, label='Proef 4 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 1, 2, 3 and 4')
plt.legend()
plt.show()
#%% Plot of test 2 and 4mat
plt.figure()
plt.plot(T21,F21, label='Proef 2 test 1')
plt.plot(T22,F22, label='Proef 2 test 2')
plt.plot(T23,F23, label='Proef 2 test 3')
plt.plot(T24,F24, label='Proef 2 test 4')
plt.plot(T41,F41, label='Proef 4 test 1')
plt.plot(T42,F42, label='Proef 4 test 2')
plt.plot(T43,F43, label='Proef 4 test 3')
plt.plot(T44,F44, label='Proef 4 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 2 and 4')
plt.legend()
plt.show()

This is assuming that you only have one dataframe. It looks like you import the same one twice: once in the start and once in the middle.

If you want to try to reduce the number of lines, you could probably write a simple function that takes whatever "Proef" numbers, the title you want to have in the plot, and the dataframe, and plot those. For example,

import pandas as pd
import matplotlib.pyplot as plt

F0=-33.5
DATA_END_VALUES = [[5000]*4] + [[2366]*4] + [[2366, 2344]*2] + [[2366]*4]
# First index of DATA_END_VALUES is the proef number while the second is 
# the test number

def plot_results(df, proef_numbers, title):
    for proef  in proef_numbers:
        for test in range(1,5):
            time_data = df[f"Time_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]] 
            f_data = df[f"F_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]] - F0 
            plt.plot(time_data, f_data, label=f"Proef {proef} test {test}" )
    plt.xlabel('Time [s]')
    plt.ylabel('Force [N]')
    plt.title(title)
    plt.legend()
    plt.show()

df = pd.read_excel ("data.xlsx")
proef_numbers = [1,2]
title = "Tests 1 and 2"
plot_results(df, proef_numbers ,title)

I made some dummy data, and it seems to behave as expected. You can check with your real data to see if it works for it. Keep in mind for plots with more data, the legend might take up a large part of the plot. You will need to tweak it later.

You also mentioned that you wanted to rename your column names to i test jk where i is whether the x or y variable, j is the "proef" number, and k is the "test" number. If you make those changes, you will just need to change the time_data line and the f_data to the following:


time_data = df[f"xtest{proef}{test}"][1:DATA_END_VALUES[proef-1][test-1]]
f_data = df[f"ytest{proef}{test}"][1:DATA_END_VALUES[proef-1][test-1]] - F0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM