简体   繁体   中英

Getting column values from multi index data frame pandas

I have a multi index data frame shown below:

    1                2

panning  sec        panning     sec

 None    5.0        None        0.0
 None    6.0        None        1.0
Panning  7.0        None        2.0 
 None    8.0        Panning     3.0
 None    9.0        None        4.0
 Panning  10.0      None        5.0

I am iterating over the rows and getting the index wherever there is a value 'panning' in the panning column by

 ide=[]
 for index,row in dfs.iterrows():
        if [row[:, 'Panning'][row[:, 'Panning'] == 'Panning']]:
               ide.append(row[:, 'Panning'][row[:, 'Panning'] == 'Panning'].index.tolist())

print ide

If I execute the above code I get the output

[[],[],[1],[2],[],[1]]

which represents the index where the value is panning

Now, I also want to get the corresponding sec value also like, for example for row 3 for value panning I would like to get sec value 7.0 along with index 1. I would like O\\P to be

[[],[],[1,7.0],[2,3.0],[],[1,10]]

Basically I need the O/P as combination of the index where the value is panning and the subsequent value in the seconds column.

consider the pd.DataFrame df in the setup reference below

method 1

  • xs for cross section
  • any(1) to check if any in row

df.loc[df.xs('Panning', axis=1, level=1).eq('Panning').any(1)]

在此输入图像描述

method 2

  • stack
  • query
  • unstack

df.stack(0).query('Panning == "Panning"').stack().unstack([-2, -1])

在此输入图像描述


To return just the sec columns

df.xs('sec', axis=1, level=1)[df.xs('Panning', axis=1, level=1).eq('Panning').any(1)]

在此输入图像描述

setup
Reference

from StringIO import StringIO
import pandas as pd

txt = """None    5.0        None        0.0
None    6.0        None        1.0
Panning  7.0        None        2.0 
None    8.0        Panning     3.0
None    9.0        None        4.0
Panning  10.0      None        5.0"""

df = pd.read_csv(StringIO(txt), delim_whitespace=True, header=None)

df.columns = pd.MultiIndex.from_product([[1, 2], ['Panning', 'sec']])
df

在此输入图像描述

You can use:

print (dfs)
         1              2     
   Panning   sec  Panning  sec
0     None   5.0     None  0.0
1     None   6.0     None  1.0
2  Panning   7.0     None  2.0
3     None   8.0  Panning  3.0
4     None   9.0     None  4.0
5  Panning  10.0     None  5.0

Looping solution :

ide=[]
for index,row in dfs.iterrows():
    if (row[:, 'Panning'] == 'Panning').any():
        idx1 = row[:, 'Panning'][row[:, 'Panning'] == 'Panning'].index.tolist()
        idx2 = row.loc[(idx1, 'sec')].values.tolist()[0]
        idx1.append(idx2)
        ide.append(idx1)
    else:
        ide.append([])

print (ide)
[[], [], ['1', 7.0], ['2', 3.0], [], ['1', 10.0]]

Stacked solution :

stacked = dfs.stack(0).reset_index(level=1)
mask = stacked['Panning'] == 'Panning'
L = stacked[mask].reindex(dfs.index).drop('Panning', axis=1).fillna('').values.tolist()
print (L)
[['', ''], ['', ''], ['1', 7.0], ['2', 3.0], ['', ''], ['1', 10.0]]

print ([x if not x == ['', ''] else [] for x in L])
[[], [], ['1', 7.0], ['2', 3.0], [], ['1', 10.0]]

Explanation :

#stacked top level of MultiIndex in column
#create column from 1. level of index values
stacked = dfs.stack(0).reset_index(level=1)
print (stacked)
  level_1  Panning   sec
0       1     None   5.0
0       2     None   0.0
1       1     None   6.0
1       2     None   1.0
2       1  Panning   7.0
2       2     None   2.0
3       1     None   8.0
3       2  Panning   3.0
4       1     None   9.0
4       2     None   4.0
5       1  Panning  10.0
5       2     None   5.0
#boolean indexing
#http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
mask = stacked['Panning'] == 'Panning'
print (mask)
0    False
0    False
1    False
1    False
2     True
2    False
3    False
3     True
4    False
4    False
5     True
5    False
Name: Panning, dtype: bool

print (stacked[mask])
  level_1  Panning   sec
2       1  Panning   7.0
3       2  Panning   3.0
5       1  Panning  10.0
#reindex by original index, remove column Panning
print (stacked[mask].reindex(dfs.index).drop('Panning', axis=1))
  level_1   sec
0     NaN   NaN
1     NaN   NaN
2       1   7.0
3       2   3.0
4     NaN   NaN
5       1  10.0

#replace NaN to '' and generate list of list
L = stacked[mask].reindex(dfs.index).drop('Panning', axis=1).fillna('').values.tolist()
print (L)
[['', ''], ['', ''], ['1', 7.0], ['2', 3.0], ['', ''], ['1', 10.0]]

#replace empty lists by empty list
print ([x if not x == ['', ''] else [] for x in L])
[[], [], ['1', 7.0], ['2', 3.0], [], ['1', 10.0]]

df.iterrows() return a Series , if you want the original index you need to call the name of that Series such has:

for index,row in df.iterrows():
    print row.name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM