简体   繁体   English

如何强制两个数组在pyplot中使用?

[英]How do I force two arrays to be equal for use in pyplot?

I'm trying to plot a simple moving averages function but the resulting array is a few numbers short of the full sample size. 我正在尝试绘制一个简单的移动平均值函数,但结果数组是一些数字,不到完整的样本大小。 How do I plot such a line alongside a more standard line that extends for the full sample size? 如何绘制这样一条线以及一条延伸到完整样本大小的更标准线? The code below results in this error message: 下面的代码导致此错误消息:

ValueError: x and y must have same first dimension, but have shapes (96,) and (100,)

This is using standard matplotlib.pyplot . 这是使用标准的matplotlib.pyplot I've tried just deleting X values using remove and del as well as switching all arrays to numpy arrays (since that's the output format of my moving averages function) then tried adding an if condition to the append in the while loop but neither has worked. 我尝试使用removedel删除X值以及将所有数组切换到numpy数组(因为那是我的移动平均函数的输出格式)然后尝试在while循环中添加if条件但是两者都没有工作。

import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

def movingaverage(values, window):
    weights = np.repeat(1.0, window) / window
    smas = np.convolve(values, weights, 'valid')
    return smas

sampleSize = 100
min = -10
max = 10
window = 5

vX = np.array([])
vY = np.array([])

x = 0
val = 0
while x < sampleSize:
    val += (random.randint(min, max))
    vY = np.append(vY, val)
    vX = np.append(vX, x)
    x += 1

plt.plot(vX, vY)
plt.plot(vX, movingaverage(vY, window))
plt.show()

Expected results would be two lines on the same graph - one a simple moving average of the other. 预期结果将是同一图表上的两行 - 一行是另一行的简单移动平均线。

Here is how you would pad a numpy array out to the desired length with 'nan's (replace 'nan' with other values, or replace 'constant' with another mode depending on desired results) https://docs.scipy.org/doc/numpy/reference/generated/numpy.pad.html 以下是如何使用'nan'将numpy数组填充到所需长度(用其他值替换'nan',或根据所需结果用另一种模式替换'constant') https://docs.scipy.org/doc /numpy/reference/generated/numpy.pad.html

import numpy as np
bob = np.asarray([1,2,3])
alice = np.pad(bob,(0,100-len(bob)),'constant',constant_values=('nan','nan'))

So in your code it would look something like this: 所以在你的代码中它看起来像这样:

import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np

def movingaverage(values,window):
    weights = np.repeat(1.0,window)/window
    smas = np.convolve(values,weights,'valid')
    shorted = int((100-len(smas))/2)
    print(shorted)
    smas = np.pad(smas,(shorted,shorted),'constant',constant_values=('nan','nan'))
    return smas

sampleSize = 100
min = -10
max = 10
window = 5

vX = np.array([])
vY = np.array([])

x = 0
val = 0
while x < sampleSize:
    val += (random.randint(min,max))
    vY = np.append(vY,val)
    vX = np.append(vX,x)
    x += 1
plt.plot(vX,vY)
plt.plot(vX,(movingaverage(vY,window)))
plt.show()

Just change this line to the following: 只需将此行更改为以下内容:

smas = np.convolve(values, weights,'same')

The 'valid' option, only convolves if the window completely covers the values array. 'valid'选项仅在窗口完全覆盖values数组时才会进行卷积。 What you want is 'same', which does what you are looking for. 你想要的是'相同',这正是你想要的。


Edit: This, however, also comes with its own issues as it acts like there are extra bits of data with value 0 when your window does not fully sit on top of the data. 编辑:然而,这也带来了它自己的问题,因为当你的窗口没有完全位于数据之上时,它的作用类似于额外的数据值0。 This can be ignored if chosen, as is done in this solution, but another approach is to pad the array with specific values of your choosing instead (see Mike Sperry's answer). 如果选择这个可以忽略,就像在此解决方案中所做的那样,但另一种方法是使用您选择的特定值填充数组(请参阅Mike Sperry的答案)。

To answer your basic question, the key is to take a slice of the x-axis appropriate to the data of the moving average. 要回答您的基本问题,关键是采用适合移动平均数据的x轴切片。 Since you have a convolution of 100 data elements with a window of size 5, the result is valid for the last 96 elements. 由于您使用大小为5的窗口对100个数据元素进行卷积,因此结果对最后96个元素有效。 You would plot it like this: 您可以这样绘制:

plt.plot(vX[window - 1:], movingaverage(vY, window))

That being said, your code could stand to have some optimization done on it. 话虽如此,您的代码可以对其进行一些优化。 For example, numpy arrays are stored in fixed size static buffers. 例如,numpy数组存储在固定大小的静态缓冲区中。 Any time you do append or delete on them, the entire thing gets reallocated, unlike Python lists, which have amortization built in. It is always better to preallocate if you know the array size ahead of time (which you do). 无论何时对它们进行追加或删除,都会重新分配整个内容,这与内置分期付款的Python列表不同。如果您提前知道数组大小,那么预分配总是更好(您可以这样做)。

Secondly, running an explicit loop is rarely necessary. 其次,很少需要运行显式循环。 You are generally better off using the under-the-hood loops implemented at the lowest level in the numpy functions instead. 你通常最好使用在numpy函数中最低级别实现的引擎下循环。 This is called vectorization. 这称为矢量化。 Random number generation, cumulative sums and incremental arrays are all fully vectorized in numpy. 随机数生成,累积和和增量数组都在numpy中完全矢量化。 In a more general sense, it's usually not very effective to mix Python and numpy computational functions, including random . 从更一般的意义上讲,混合Python和numpy计算函数(包括random函数)通常不是很有效。

Finally, you may want to consider a different convolution method. 最后,您可能想要考虑不同的卷积方法。 I would suggest something based on numpy.lib.stride_tricks.as_strided . 我建议基于numpy.lib.stride_tricks.as_strided东西。 This is a somewhat arcane, but very effective way to implement a sliding window with numpy arrays. 这是一种有点神秘但非常有效的方法来实现具有numpy数组的滑动窗口。 I will show it here as an alternative to the convolution method you used, but feel free to ignore this part. 我将在此处将其显示为您使用的卷积方法的替代方法,但请随意忽略此部分。

All in all: 总而言之:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np

def movingaverage(values, window):
    # this step creates a view into the same buffer
    values = np.lib.stride_tricks.as_strided(values, shape=(window, values.size - window + 1), strides=values.strides * 2)
    smas = values.sum(axis=0)
    smas /= window  # in-place to avoid temp array
    return smas

sampleSize = 100
min = -10
max = 10
window = 5

v_x = np.arange(sampleSize)
v_y = np.cumsum(np.random.random_integers(min, max, sampleSize))

plt.plot(v_x, v_y)
plt.plot(v_x[window - 1:], movingaverage(v_y, window))
plt.show()

A note on names: in Python, variable and function names are conventionally name_with_underscore. 关于名称的注释:在Python中,变量和函数名称通常是name_with_underscore。 CamelCase is reserved for class names. CamelCase保留给类名。 np.random.random_integers uses inclusive bounds just like random.randint , but allows you to specify the number of samples to generate. np.random.random_integers使用包含边界,就像random.randint一样,但允许您指定要生成的样本数。 Confusingly, np.random.randint has an exclusive upper bound, more like random.randrange . 令人困惑的是, np.random.randint有一个独占上限,更像是random.randrange

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM