Selecting distinct pandas data frame based on combination of multiple columns value

Question

Selecting distinct pandas data frame based on combination of multiple columns value.

I have a data like:

    Time             locIP          remIp locPort remPort   numReads numWrites
0   20180529235221  127.0.0.1   127.0.0.1   22  565 36736   36751
1   20180529235221  127.0.0.1   127.0.0.1   22  566 36736   74690
2   20180529235221  127.0.0.1   127.0.0.1   12  567 36736   36749
3   20180529235221  10.8.21.41  10.8.21.34  22  565 36744   36738
4   20180529235221  10.8.21.41  10.8.21.34  22  566 36744   36738
5   20180529235225  127.0.0.1   127.0.0.1   22  565 36788   36751
6   20180529235225  127.0.0.1   127.0.0.1   22  566 36788   74700
7   20180529235225  127.0.0.1   127.0.0.1   12  567 36788   36800

I want to plot time series graph for each combination of (locIP, remIP, LocPort remPort) and numReads.

For this I am looking for different smaller dataframes like:

    Time            locIP       remIp   locPort remPort numReads    numWrites
0   20180529235221  127.0.0.1   127.0.0.1   22  565 36736   36751
5   20180529235225  127.0.0.1   127.0.0.1   22  565 36736   36751

Another one:

Time             locIP        remIp  locPort    remPort  numReads   numWrites
20180529235221  127.0.0.1   127.0.0.1   22  566 36736   74690
20180529235225  127.0.0.1   127.0.0.1   22  566 36788   74700

I was trying condition on multiple columns:

df1 =df[(df["locIP"] =='127.0.0.1') & (df["remIp"] == '127.0.0.1') & (df['locPort']== '22') & (df['remPort']=='565')]

But Here I have to extract all the combinations in condition variable. Looking for a better way.

Answer 1

This might work for you.

import itertools
#Create a dictionary to populate with a collection of unique values.
d = {}
#Grab header list 
head = list(df)
#Create a collection of unique values 
for x in head:
     d[x] = list(set(df[x]))
#Create all possible combinations
c = list(itertools.product(d['locIP'],d['locPort'],d['remIp'],d['remPort']))
#Create list to populate with selected dataframes
NonEmpdf =[]
for x in c:
     selectTxt = 'locIP == {} & locPort == {} & remIp == {} & remPort == {}'.format("'"+x[0]+"'",x[1],"'"+x[2]+"'",x[3])
     print selectTxt
     dfSel = df.query(selectTxt)
     if dfSel.empty:
         print 'Empty'
     else:
         NonEmpdf.append(dfSel)
#Then this is a collection of all non-empty dataframes you can iterate through and plot.
NonEmpdf

Also .any() might be of use to you.

Selecting distinct pandas data frame based on combination of multiple columns value

Question

1 answers

solution1
0 2018-06-13 20:32:32

Selecting distinct pandas data frame based on combination of multiple columns value

Question

1 answers

solution1 0 2018-06-13 20:32:32

solution1
0 2018-06-13 20:32:32