简体   繁体   中英

Pandas groupby results on the same plot

I am dealing with the following data frame (only for illustration, actual df is quite large):

   seq          x1         y1
0  2           0.7725      0.2105
1  2           0.8098      0.3456
2  2           0.7457      0.5436
3  2           0.4168      0.7610
4  2           0.3181      0.8790
5  3           0.2092      0.5498
6  3           0.0591      0.6357
7  5           0.9937      0.5364
8  5           0.3756      0.7635
9  5           0.1661      0.8364

Trying to plot multiple line graph for the above coordinates (x as "x1 against y as "y1").

Rows with the same "seq" is one path, and has to be plotted as one separate line, like all the x, y coordinates corresponding the seq = 2 belongs to one line, and so on.

I am able to plot them, but on a separate graphs, I want all the lines on the same graph, Using subplots , but not getting it right.

import matplotlib as mpl
import matplotlib.pyplot as plt

%matplotlib notebook

df.groupby("seq").plot(kind = "line", x = "x1", y = "y1")

This creates 100's of graphs (which is equal to the number of unique seq). Suggest me a way to obtain all the lines on the same graph.

**UPDATE*

To resolve the above problem, I implemented the following code:

     fig, ax = plt.subplots(figsize=(12,8))
     df.groupby('seq').plot(kind='line', x = "x1", y = "y1", ax = ax)
     plt.title("abc")
     plt.show()

Now, I want a way to plot the lines with specific colors. I am clustering path from seq = 2 and 5 in cluster 1; and path from seq = 3 in another cluster.

So, there are two lines under cluster 1 which I want in red and 1 line under cluster 2 which can be green.

How should I proceed with this?

You need to init axis before plot like in this example

import pandas as pd
import matplotlib.pylab as plt
import numpy as np

# random df
df = pd.DataFrame(np.random.randint(0,10,size=(25, 3)), columns=['ProjID','Xcoord','Ycoord'])

# plot groupby results on the same canvas 
fig, ax = plt.subplots(figsize=(8,6))
df.groupby('ProjID').plot(kind='line', x = "Xcoord", y = "Ycoord", ax=ax)
plt.show()

在此处输入图片说明

Consider the dataframe df

df = pd.DataFrame(dict(
        ProjID=np.repeat(range(10), 10),
        Xcoord=np.random.rand(100),
        Ycoord=np.random.rand(100),
    ))

Then we create abstract art like this

df.set_index('Xcoord').groupby('ProjID').Ycoord.plot()

在此处输入图片说明

Another way:

for k,g in df.groupby('ProjID'):
  plt.plot(g['Xcoord'],g['Ycoord'])

plt.show()

Here is a working example including the ability to adjust legend names.

grp = df.groupby('groupCol')

legendNames = grp.apply(lambda x: x.name)  #Get group names using the name attribute.
#legendNames = list(grp.groups.keys())  #Alternative way to get group names. Someone else might be able to speak on speed. This might iterate through the grouper and find keys which could be slower? Not sure

plots = grp.plot('x1','y1',legend=True, ax=ax)

for txt, name in zip(ax.legend_.texts, legendNames):
    txt.set_text(name)

Explanation: Legend values get stored in the parameter ax.legend_ which in turn contains a list of Text() objects, with one item per group, where Text class is found within the matplotlib.text api. To set the text object values, you can use the setter method set_text(self, s).

As a side note, the Text class has a number of set_X() methods that allow you to change the font sizes, fonts, colors, etc. I haven't used those, so I don't know for sure they work, but can't see why not.

based on Serenity's anwser, i make the legend better.

import pandas as pd
import matplotlib.pylab as plt
import numpy as np

# random df
df = pd.DataFrame(np.random.randint(0,10,size=(25, 3)), columns=['ProjID','Xcoord','Ycoord'])

# plot groupby results on the same canvas 
grouped = df.groupby('ProjID')
fig, ax = plt.subplots(figsize=(8,6))
grouped.plot(kind='line', x = "Xcoord", y = "Ycoord", ax=ax)
ax.legend(labels=grouped.groups.keys()) ## better legend
plt.show()

and you can also do it like:

grouped = df.groupby('ProjID')
fig, ax = plt.subplots(figsize=(8,6))
g_plot = lambda x:x.plot(x = "Xcoord", y = "Ycoord", ax=ax, label=x.name)
grouped.apply(g_plot)
plt.show()

and it looks like: 在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM