[英]turn pandas dataframe into series by iterating over columns
I have a dataframe, and I am trying to get a Series of the form: 我有一个数据框,并且我正在尝试获取以下形式的系列:
col1 col2 col3
col1 1.0 0.20 0.70
col2 0.2 1.00 0.01
col3 0.7 0.01 1.00
GOAL: 目标:
col1Xcol1 1.0
col1Xcol2 0.2
col1Xcol3 0.7
col2Xcol1 0.2
...
My code so far: 到目前为止,我的代码:
pvals2=pd.DataFrame({'col1': [1, .2,.7],
'col2': [.2, 1,.01],
'col3': [.7,.01,1]},
index = ['col1', 'col2', 'col3'])
print(pvals.transpose().join(pvals, how='outer',lsuffix='_left', rsuffix='_right'))
OUTPUT: 输出:
vote_left ballot1_left ballot1_x_left vote_right ballot1_right \
vote 0 0.0923 0.0521 0 0.0923
ballot1 0.0923 0 0.8213 0.0923 0
ballot1_x 0.0521 0.8213 0 0.0521 0.8213
ballot1_x_right
vote 0.0521
ballot1 0.8213
ballot1_x 0
concat
and setting the new index works: concat
并设置新索引有效:
>>> ser = pd.concat([pvals2[col] for col in pvals2.columns])
>>> ser.index = [pvals2[col].name + 'X' + x for col in pvals2.columns
for x in pvals2[col].index]
>>> ser
col1Xcol1 1.00
col1Xcol2 0.20
col1Xcol3 0.70
col2Xcol1 0.20
col2Xcol2 1.00
col2Xcol3 0.01
col3Xcol1 0.70
col3Xcol2 0.01
col3Xcol3 1.00
dtype: float64
The following code: 如下代码:
pvals = pd.DataFrame({'col1': [1, .2,.7],
'col2': [.2, 1,.01],
'col3': [.7,.01,1]},
index = ['row1', 'row2', 'row3'])
values = []
ind = []
for i in range(len(pvals.index)):
for col in pvals:
row = pvals.index[i]
values.append(pvals[col][row])
ind.append("%sX%s" % (row, col))
newpvals = pd.Series(values, ind)
gives: 给出:
>>> newvals
row1Xcol1 1.00
row1Xcol2 0.20
row1Xcol3 0.70
row2Xcol1 0.20
row2Xcol2 1.00
row2Xcol3 0.01
row3Xcol1 0.70
row3Xcol2 0.01
row3Xcol3 1.00
dtype: float64
Edit: I misread, so changed into Series
. 编辑:我读错了,所以变成了
Series
。
Consider melt
with column assignment for new index then select the value column since a single pandas DataFrame column is a pandas Series: 考虑
melt
与新的索引列分配,然后选择相应的值列,因为一个单一的大熊猫据帧列是熊猫系列:
Data 数据
from io import StringIO
import pandas as pd
txt = ''' col1 col2 col3
col1 1.0 0.20 0.70
col2 0.2 1.00 0.01
col3 0.7 0.01 1.00'''
df = pd.read_table(StringIO(txt), sep="\s+")
Series build 系列构建
mdf = pd.melt(df.reset_index(), id_vars='index')
mdf['s'] = mdf['index'] + 'X' + mdf['variable']
new_series = mdf.set_index('s').rename_axis(None)['value']
print(new_series)
# col1Xcol1 1.00
# col2Xcol1 0.20
# col3Xcol1 0.70
# col1Xcol2 0.20
# col2Xcol2 1.00
# col3Xcol2 0.01
# col1Xcol3 0.70
# col2Xcol3 0.01
# col3Xcol3 1.00
# Name: value, dtype: float64
First stack the dataframe 首先堆叠数据框
st = pvals2.stack()
Create a new index, by adding together the multiindex 通过将多索引加在一起来创建新索引
newdex = st.index._get_level_values(0) + 'X' + st.index._get_level_values(1)
Set newdex
as the index for the series 将
newdex
设置newdex
系列的索引
st.set_axis(0,newdex)
All together 全部一起
st = pvals2.stack()
st.set_axis(0,st.index._get_level_values(0) + 'X' + st.index._get_level_values(1))
col1Xcol1 1.00
col1Xcol2 0.20
col1Xcol3 0.70
col2Xcol1 0.20
col2Xcol2 1.00
col2Xcol3 0.01
col3Xcol1 0.70
col3Xcol2 0.01
col3Xcol3 1.00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.