通过遍历列将pandas数据框转换为序列

Question

I have a dataframe, and I am trying to get a Series of the form: 我有一个数据框，并且我正在尝试获取以下形式的系列：

      col1  col2  col3
col1   1.0  0.20  0.70
col2   0.2  1.00  0.01
col3   0.7  0.01  1.00

GOAL: 目标：

col1Xcol1 1.0
col1Xcol2 0.2
col1Xcol3 0.7
col2Xcol1 0.2
...

My code so far: 到目前为止，我的代码：

pvals2=pd.DataFrame({'col1': [1, .2,.7], 
                     'col2': [.2, 1,.01],
                     'col3': [.7,.01,1]},
                    index = ['col1', 'col2', 'col3'])

print(pvals.transpose().join(pvals, how='outer',lsuffix='_left', rsuffix='_right'))

OUTPUT: 输出：

          vote_left ballot1_left ballot1_x_left vote_right ballot1_right  \
vote              0       0.0923         0.0521          0        0.0923   
ballot1      0.0923            0         0.8213     0.0923             0   
ballot1_x    0.0521       0.8213              0     0.0521        0.8213   

          ballot1_x_right  
vote               0.0521  
ballot1            0.8213  
ballot1_x               0

Answer 1

concat and setting the new index works: concat并设置新索引有效：

>>> ser = pd.concat([pvals2[col] for col in pvals2.columns])
>>> ser.index = [pvals2[col].name + 'X' + x for col in pvals2.columns 
                 for x in pvals2[col].index]
>>> ser
col1Xcol1    1.00
col1Xcol2    0.20
col1Xcol3    0.70
col2Xcol1    0.20
col2Xcol2    1.00
col2Xcol3    0.01
col3Xcol1    0.70
col3Xcol2    0.01
col3Xcol3    1.00
dtype: float64

Answer 2

The following code: 如下代码：

pvals = pd.DataFrame({'col1': [1, .2,.7], 
                      'col2': [.2, 1,.01],
                      'col3': [.7,.01,1]},
                     index = ['row1', 'row2', 'row3'])

values = []
ind = []
for i in range(len(pvals.index)):
    for col in pvals:
        row = pvals.index[i]
        values.append(pvals[col][row])
        ind.append("%sX%s" % (row, col))

newpvals = pd.Series(values, ind)

gives: 给出：

>>> newvals
row1Xcol1    1.00
row1Xcol2    0.20
row1Xcol3    0.70
row2Xcol1    0.20
row2Xcol2    1.00
row2Xcol3    0.01
row3Xcol1    0.70
row3Xcol2    0.01
row3Xcol3    1.00
dtype: float64

Edit: I misread, so changed into Series . 编辑：我读错了，所以变成了Series 。

Answer 3

Consider melt with column assignment for new index then select the value column since a single pandas DataFrame column is a pandas Series: 考虑melt与新的索引列分配，然后选择相应的值列，因为一个单一的大熊猫据帧列是熊猫系列：

Data 数据

from io import StringIO
import pandas as pd

txt = '''      col1  col2  col3
col1   1.0  0.20  0.70
col2   0.2  1.00  0.01
col3   0.7  0.01  1.00'''

df = pd.read_table(StringIO(txt), sep="\s+")

Series build 系列构建

mdf = pd.melt(df.reset_index(), id_vars='index')
mdf['s'] = mdf['index'] + 'X' + mdf['variable']

new_series = mdf.set_index('s').rename_axis(None)['value']

print(new_series)
# col1Xcol1    1.00
# col2Xcol1    0.20
# col3Xcol1    0.70
# col1Xcol2    0.20
# col2Xcol2    1.00
# col3Xcol2    0.01
# col1Xcol3    0.70
# col2Xcol3    0.01
# col3Xcol3    1.00
# Name: value, dtype: float64

Answer 4

First stack the dataframe 首先堆叠数据框

st = pvals2.stack()

Create a new index, by adding together the multiindex 通过将多索引加在一起来创建新索引

newdex = st.index._get_level_values(0) + 'X' + st.index._get_level_values(1)

Set newdex as the index for the series 将newdex设置newdex系列的索引

st.set_axis(0,newdex)

All together 全部一起

st = pvals2.stack()
st.set_axis(0,st.index._get_level_values(0) + 'X' + st.index._get_level_values(1))

col1Xcol1    1.00
col1Xcol2    0.20
col1Xcol3    0.70
col2Xcol1    0.20
col2Xcol2    1.00
col2Xcol3    0.01
col3Xcol1    0.70
col3Xcol2    0.01
col3Xcol3    1.00

通过遍历列将pandas数据框转换为序列

问题描述

4 个解决方案

解决方案1
0 2018-03-02 22:05:51

解决方案2
0 2018-03-02 22:14:43

解决方案3
0 2018-03-02 22:15:39

解决方案4
0 已采纳 2018-03-02 22:42:56

通过遍历列将pandas数据框转换为序列

问题描述

4 个解决方案

解决方案1 0 2018-03-02 22:05:51

解决方案2 0 2018-03-02 22:14:43

解决方案3 0 2018-03-02 22:15:39

解决方案4 0 已采纳 2018-03-02 22:42:56

解决方案1
0 2018-03-02 22:05:51

解决方案2
0 2018-03-02 22:14:43

解决方案3
0 2018-03-02 22:15:39

解决方案4
0 已采纳 2018-03-02 22:42:56