[英]I'm not sure what is wrong with my code.. (linear/polynomial regression)
I have a data set (csv file) with three seperate columns.我有一个包含三个单独列的数据集(csv 文件)。 Column 0 is the signal time, Column 1 is the frequency, and Column 2 is the intensity.
第 0 列是信号时间,第 1 列是频率,第 2 列是强度。 The is alot of noise in the data that can be sorted though by finding the variance of each signal frequency.
数据中有很多噪声,可以通过查找每个信号频率的方差来进行排序。 If it is <2332 then it is the right frequency.
如果它是 <2332 那么它是正确的频率。 Hence, this would be the data I would want to calculate Linear/Poly regression on.
因此,这将是我想要计算线性/多元回归的数据。 ps I have to calc linear manually:(. The nested for loop decision structure I have isn't currently working. Any solutions would be helpful! thanks
ps我必须手动计算线性:(。我拥有的嵌套for循环决策结构目前不起作用。任何解决方案都会有所帮助!谢谢
data = csv.reader(file1)
sort = sorted(data, key=(operator.itemgetter(1))) #sorted by the frequencies
for row in sort:
x.append(float(row[0]))
y.append(float(row[2]))
frequencies.append(float(row[1]))
for i in range(499) :
freq_dict.update({ frequencies[i] : [x[i], y[i]] })
for key in freq_dict.items():
for row in sort :
if key == float(row[1]):
a.append(float(row[1]))
b.append(float(row[2]))
c.append(float(row[0]))
else :
num = np.var(a)
if num < 2332.0:
linearRegression(c, b, linear)
print('yo')
polyRegression(c, b, d, linear, py)
mplot.plot(linear, py)
else:
a = []
b = []
c = []
I used range of 499 because that is the length of my data set.我使用了 499 的范围,因为这是我的数据集的长度。 Also, I tried to clear the lists (a,b,c) if the frequency wasn't correct.
此外,如果频率不正确,我尝试清除列表 (a,b,c)。
There are several issues I see going on.我看到有几个问题正在发生。 I am unsure why you sort your data, if you all ready know the exact values you are looking for.
如果您都准备好知道您要查找的确切值,我不确定您为什么对数据进行排序。 I am unsure why you split up the data into separate variables as well.
我不确定您为什么还要将数据拆分为单独的变量。 The double "for" loops means that you are repeating everything in "sort" for every single key in freq_dict.
双“for”循环意味着您正在为 freq_dict 中的每个键重复“排序”中的所有内容。 Not sure if that was your intention to repeat all those values multiple times.
不确定您是否打算多次重复所有这些值。 Also, freq_dict.items() produces tuples (key,value pairs), so your "key" is a tuple, hence "key" will never equal a float.
此外, freq_dict.items() 产生元组(键,值对),所以你的“键”是一个元组,因此“键”永远不会等于一个浮点数。 Anyway, here is an attempt to re-write some code.
无论如何,这里是重新编写一些代码的尝试。
import csv, numpy
import matplotlib.pyplot as plt
from scipy import stats
data = csv.reader(file1) #Read file.
f_data = filter(lambda (x,f,y):f<2332.0,data) #Filter data to condition.
x,_,y = list(zip(*f_data)) #Split data down column.
#Standard linear stats function.
slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)
#Plot the data and the fit line.
plt.scatter(x,y)
plt.plot(x,numpy.array(x)*slope+intercept)
plt.show()
A more similar solution was using the corrcoef of the list.一个更类似的解决方案是使用列表的 corrcoef。 But in similar style it was as follows:
但类似的风格如下:
for key, value in freq_dict.items(): #1487
for row in sort: #when row -> goes to a new freq it calculates corrcoef of an empty list.
if key == float(row[1]): #1487
a.append(float(row[2]))
b.append(float(row[0]))
elif key != float(row[1]):
if a:
num = np.corrcoef(b, a)[0,1]
if (num < somenumber).any():
do stuff
a = [] #clear the lists and reset number
b = []
num = 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.