简体   繁体   English

Python:有条件地在循环中从 Dataframe 的多列中绘制数据

[英]Python: Conditionally plotting data from many columns from a Dataframe in a loop

I have about 200 pairs of columns in a dataframe that I would like to plot in a single plot.我在 dataframe 中有大约 200 对列,我想在单个 plot 中使用 plot。 Each pair of columns can be thought of as related "x" and "y" variables.每对列都可以被认为是相关的“x”和“y”变量。 Some of the "y variables" are 0 at certain points in the data.某些“y 变量”在数据中的某些点为 0。 I don't want to plot those.我不想 plot 那些。 I would rather they show up as a discontinuity in the plot.我宁愿它们在 plot 中显示为不连续性。 I am not able to figure out an efficient way to excluse those variables.我无法找出排除这些变量的有效方法。 There is also a "date" variable that I don't need in the plot but I am keeping it in the sample data just to mirror the reality. plot 中还有一个我不需要的“日期”变量,但我将其保留在示例数据中只是为了反映现实。

Here is a sample data set and what I have done with it.这是一个示例数据集以及我用它做了什么。 I created my sample dataset in a hurry, the original data has unique "y values" for a given "x value" for every pair of column data.我匆忙创建了我的示例数据集,原始数据对于每对列数据的给定“x 值”具有唯一的“y 值”。

import pandas as pd
from numpy.random import randint

data1y = [n**3 -n**2+n for n in range(12)]
data1x = [randint(0, 100) for n in range(12)]
data1x.sort()
data2y = [n**3 for n in range(12)]
data2x = [randint(0, 100) for n in range(12)]
data2x.sort()
data3y = [n**3 - n**2 for n in range(12)]
data3x = [randint(0, 100) for n in range(12)]
data3x.sort()
data1y = [0 if x%7==0 else x for x in data1y]
data2y = [0 if x%7==0 else x for x in data2y]
data3y = [0 if x%7==0 else x for x in data3y]

date = ['Jan','Feb','Mar','Apr','May', 'Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df = pd.DataFrame({'Date':date,'Var1':data1y, 'Var1x':data1x, 'Vartwo':data2y, 'Vartwox':data2x,'datatree':data3y, 'datatreex':data3x})

print(df)

ax = plt.gca()
fig = plt.figure()
for k in ['Var1','Vartwo','datatree']:
    df.plot(x=k+'x', y=k, kind = 'line',ax=ax)enter code here

The output I get this this: output 我得到这个: 在此处输入图像描述

I would like to see discontinuity where the 'y variables' are zero.我希望看到“y 变量”为零的不连续性。

I have tried:我努力了:

import numpy as np
df2 = df.copy()
df2[df2.Var1 < 0.5] = np.nan

But this makes an entire row NaN when I only want it to be a particular variable.但是当我只希望它是一个特定的变量时,这会产生一整行 NaN。

在此处输入图像描述

I'm trying this but it isnt working.我正在尝试这个,但它不起作用。

ax = plt.gca()
fig = plt.figure()
for k in ['Var1','Vartwo','datatree']:
    filter = df.k.values > 0
    x = df.k+'x'
    y = df.k
    plot(x[filter], y[filter], kind = 'line',ax=ax)

This works for a single variable but I don't know how to loop it across 200 variables and this also doesn't show the discontinuities.这适用于单个变量,但我不知道如何在 200 个变量中循环它,这也没有显示不连续性。

import matplotlib.pyplot as plt
ax = plt.gca()
fig = plt.figure()
for k in ['Var1','Vartwo','datatree']:
    filter = df.Var1.values > 0
    x = df.Var1x[filter]
    y = df.Var1[filter]
    plt.plot(x, y)

You're looking for .replace() :您正在寻找.replace()

df2 = df.copy()
cols_to_replace = ['Var1','Var1x','Vartwo']
df2[cols_to_replace] = df2[cols_to_replace].replace({0:np.nan})

fig, ax = plt.subplots()
for k in ['Var1','Vartwo','datatree']:
    df2.plot(x=k+'x', y=k, kind = 'line',ax=ax)

Result:结果:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM