For 循環遍歷 csv 的列

Question

我對 Python 和一般編程非常陌生（這是我的第一門編程語言，我大約一個月前開始使用）。

我有一個 CSV 文件，其中包含這樣排序的數據（底部的 CSV 文件數據）。 有31列數據。 第一列（波長）必須作為自變量 (x) 讀入，對於第一次迭代，它必須讀入第二列（即標記為“觀察”的第一列）作為因變量 (y)。 然后，我嘗試將高斯 + 線模型擬合到數據中，並從應存儲在數組中以供進一步分析的數據中提取高斯 (mu) 的均值。 應對每組觀察重復此過程，同時讀入的 x 值必須保持不變（即來自波長列）

這是我目前如何讀取數據的代碼：

import numpy as np #importing necessary packages
import matplotlib.pyplot as plt
import pandas as pd
import scipy as sp
from scipy.optimize import curve_fit
e=np.exp
spectral_data=np.loadtxt(r'C:/Users/Sidharth/Documents/Computing Labs/Project 1/Halpha_spectral_data.csv', delimiter=',', skiprows=2) #importing data file
print(spectral_data)
x=spectral_data[:,0] #selecting column 0 to be x-axis data
y=spectral_data[:,1] #selecting column 1 to be y-axis data

所以我需要自動化這個過程，這樣就不必每次迭代都手動將 y=spectral_data[:,1] 更改為 y=spectral_data[:,2] 直到 y=spectral_data[:,30]，它可以簡單地自動化。

我生成高斯擬合的代碼如下：

plt.scatter(x,y) #produce scatter plot
plt.title('Observation 1')
plt.ylabel('Intensity (arbitrary units)')
plt.xlabel('Wavelength (m)')
plt.plot(x,y,'*')
plt.plot(x,c+m*x,'-') #plots the fit

print('The slope and intercept of the regression is,', m,c)
m_best=m
c_best=c
def fit_gauss(x,a,mu,sig,m,c):
    gaus = a*sp.exp(-(x-mu)**2/(2*sig**2))
    line = m*x+c
    return gaus + line

initial_guess=[160,7.1*10**-7,0.2*10**-7,m_best,c_best]
po,po_cov=sp.optimize.curve_fit(fit_gauss,x,y,initial_guess)

高斯似乎擬合得很好（如圖所示），因此這個高斯的平均值（即其峰值的 x 坐標）是我必須從中提取的值。 均值的值在控制台中給出（用 mu 表示）：

The slope and intercept of the regression is, -731442221.6844947 616.0099144830941
The signal parameters are
 Gaussian amplitude = 19.7 +/- 0.8
 mu = 7.1e-07 +/- 2.1e-10
 Gaussian width (sigma) = -0.0 +/- 0.0
and the background estimate is
 m = 132654859.04 +/- 6439349.49
 c = 40 +/- 5

所以我的問題是，如何迭代從 csv 讀取數據的過程，這樣我就不必手動更改列 y 從中獲取數據，然后如何存儲每次迭代的 mu 值讀入以便我以后可以用那個意思做進一步的分析/計算？

我的想法是我應該使用for 循環，但我不知道該怎么做。

圖中顯示的橙色線是我之前嘗試過的一些代碼的結果。 我認為它無關緊要，這就是為什么它不在問題的主要部分，但如果有必要，這就是全部。

x=spectral_data[:,0] #selecting column 0 to be x-axis data
y=spectral_data[:,1] #selecting column 1 to be y-axis data
plt.scatter(x,y) #produce scatter plot
plt.title('Observation 1')
plt.ylabel('Intensity (arbitrary units)')
plt.xlabel('Wavelength (m)')
plt.plot(x,y,'*')
plt.plot(x,c+m*x,'-') #plots the fit

Answer 1

通常，當您遇到這樣的問題時，嘗試將其分解為必須保持不變的內容（在您的示例中，x 數據和分析代碼）以及必須更改的內容（y 數據，或更具體的index 告訴代碼的其余部分，y 數據的正確列是什么），以及如何保留您希望存儲的值。
一旦你弄清楚了這一點，我們需要形式化正確的循環以及如何存儲我們想要的值。 要完成后者，一個簡單的方法是將它們存儲在一個列表中，因此我們將啟動一個空列表，並在每次循環迭代結束時將值附加到該列表中。

mu_list = [] # will store our mu's in this list
for i in range(1, 31): # each iteration i gets a different value, starting with 1 and ends with 30 (and not 31)
    x = spectral_data[:, 0]
    y = spectral_data[:, i]
    # Your analysis and plot code here #
    mu = po[1] # Not sure po[1] is the right place where your mu is, please change it appropriately...
    mu_list.append(mu) # store mu at the end of our growing mu_list

您將在mu_list下mu_list 30 mu的列表。

現在，注意我們不必在循環內做所有事情，例如 x 是相同的，不管i是什么（只加載 x 一次 - 提高性能）並且分析代碼基本相同，除了不同的輸入（y 數據)，因此我們可以為它定義一個函數（一種使更大的代碼更具可讀性的好習慣），因此很可能我們可以將它們從循環中取出。 我們可以在循環之前寫x = spectral_data[:, 0] ，並定義一個分析數據並返回 mu 的函數：

def analyze(x, y):
    # Your analysis and plot code here #
    mu = po[1]
    return mu

x = spectral_data[:, 0]
mu_list = [] # will store our mu's in this list
for i in range(1, 31):
    y = spectral_data[:, i]
    mu_list.append(analyze(x,y)) # Will calculate mu using our function, and store it at the end of our growing mu_list

For 循環遍歷 csv 的列

問題描述

1 個解決方案

解決方案1
0 已采納 2020-11-21 19:44:06

For 循環遍歷 csv 的列

問題描述

1 個解決方案

解決方案1 0 已采納 2020-11-21 19:44:06

解決方案1
0 已采納 2020-11-21 19:44:06