简体   繁体   中英

turn pandas dataframe into series by iterating over columns

I have a dataframe, and I am trying to get a Series of the form:

      col1  col2  col3
col1   1.0  0.20  0.70
col2   0.2  1.00  0.01
col3   0.7  0.01  1.00

GOAL:

col1Xcol1 1.0
col1Xcol2 0.2
col1Xcol3 0.7
col2Xcol1 0.2
...

My code so far:

pvals2=pd.DataFrame({'col1': [1, .2,.7], 
                     'col2': [.2, 1,.01],
                     'col3': [.7,.01,1]},
                    index = ['col1', 'col2', 'col3'])

print(pvals.transpose().join(pvals, how='outer',lsuffix='_left', rsuffix='_right'))

OUTPUT:

          vote_left ballot1_left ballot1_x_left vote_right ballot1_right  \
vote              0       0.0923         0.0521          0        0.0923   
ballot1      0.0923            0         0.8213     0.0923             0   
ballot1_x    0.0521       0.8213              0     0.0521        0.8213   

          ballot1_x_right  
vote               0.0521  
ballot1            0.8213  
ballot1_x               0  

concat and setting the new index works:

>>> ser = pd.concat([pvals2[col] for col in pvals2.columns])
>>> ser.index = [pvals2[col].name + 'X' + x for col in pvals2.columns 
                 for x in pvals2[col].index]
>>> ser
col1Xcol1    1.00
col1Xcol2    0.20
col1Xcol3    0.70
col2Xcol1    0.20
col2Xcol2    1.00
col2Xcol3    0.01
col3Xcol1    0.70
col3Xcol2    0.01
col3Xcol3    1.00
dtype: float64

The following code:

pvals = pd.DataFrame({'col1': [1, .2,.7], 
                      'col2': [.2, 1,.01],
                      'col3': [.7,.01,1]},
                     index = ['row1', 'row2', 'row3'])

values = []
ind = []
for i in range(len(pvals.index)):
    for col in pvals:
        row = pvals.index[i]
        values.append(pvals[col][row])
        ind.append("%sX%s" % (row, col))

newpvals = pd.Series(values, ind)

gives:

>>> newvals
row1Xcol1    1.00
row1Xcol2    0.20
row1Xcol3    0.70
row2Xcol1    0.20
row2Xcol2    1.00
row2Xcol3    0.01
row3Xcol1    0.70
row3Xcol2    0.01
row3Xcol3    1.00
dtype: float64

Edit: I misread, so changed into Series .

Consider melt with column assignment for new index then select the value column since a single pandas DataFrame column is a pandas Series:

Data

from io import StringIO
import pandas as pd

txt = '''      col1  col2  col3
col1   1.0  0.20  0.70
col2   0.2  1.00  0.01
col3   0.7  0.01  1.00'''

df = pd.read_table(StringIO(txt), sep="\s+")

Series build

mdf = pd.melt(df.reset_index(), id_vars='index')
mdf['s'] = mdf['index'] + 'X' + mdf['variable']

new_series = mdf.set_index('s').rename_axis(None)['value']

print(new_series)
# col1Xcol1    1.00
# col2Xcol1    0.20
# col3Xcol1    0.70
# col1Xcol2    0.20
# col2Xcol2    1.00
# col3Xcol2    0.01
# col1Xcol3    0.70
# col2Xcol3    0.01
# col3Xcol3    1.00
# Name: value, dtype: float64

First stack the dataframe

st = pvals2.stack()

Create a new index, by adding together the multiindex

newdex = st.index._get_level_values(0) + 'X' + st.index._get_level_values(1)

Set newdex as the index for the series

st.set_axis(0,newdex)

All together

st = pvals2.stack()
st.set_axis(0,st.index._get_level_values(0) + 'X' + st.index._get_level_values(1))

col1Xcol1    1.00
col1Xcol2    0.20
col1Xcol3    0.70
col2Xcol1    0.20
col2Xcol2    1.00
col2Xcol3    0.01
col3Xcol1    0.70
col3Xcol2    0.01
col3Xcol3    1.00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM