[英]Return pandas.DataFrame when slice has one row result
Consider the following: 考虑以下:
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.random.randn(5, 2), index=[100, 101, 101, 102, 103])
>>> idx = set(df.index)
>>> for id_ in idx:
... slice = df.loc[id_]
... # stuff with slice
>>>
I need to do stuff with slice
within the for
loop but that stuff is predicated on slice
being a DataFrame
. 我需要在for
循环中使用slice
做一些事情,但这些东西是基于slice
作为DataFrame
。 slice
is a DataFrame
when there are more than one matching records, but a Series
otherwise. 当有多个匹配记录时, slice
是一个DataFrame
,否则是一个Series
。 I know pandas.Series
has the Series.to_frame
method but pandas.DataFrame
does not (so I cannot just call df.loc[id_].to_frame()
). 我知道pandas.Series
有Series.to_frame
方法,但pandas.DataFrame
没有(所以我不能只调用df.loc[id_].to_frame()
)。
What is the best way to test and coerce slice
into a DataFrame
? 测试和强制slice
到DataFrame
的最佳方法是DataFrame
?
(Is it really as simple as testing if isinstance(df.loc[id_], pd.Series)
?) (它是否真的像测试isinstance(df.loc[id_], pd.Series)
一样简单isinstance(df.loc[id_], pd.Series)
?)
You can loop by groupby
object by index ( level=0
): 您可以通过groupby
对象循环( level=0
):
for i, df1 in df.groupby(level=0):
print (df1)
0 1
100 -0.812375 -0.450793
0 1
101 1.070801 0.217421
101 -1.175859 -0.926117
0 1
102 -0.993948 0.586806
0 1
103 1.063813 0.237741
Your solution should be changed by selecting double []
for return DataFrame
: 您应该通过为返回DataFrame
选择double []
来更改您的解决方案:
idx = set(df.index)
for id_ in idx:
df1 = df.loc[[id_]]
print (df1)
0 1
100 -0.775057 -0.979104
0 1
101 -1.549363 -1.206828
101 0.445008 -0.173086
0 1
102 1.488947 -0.79252
0 1
103 1.838997 -0.439362
Or use df[...]
conditioning df.index
: 或者使用df[...]
条件df.index
:
...
for id_ in idx:
slice = df[df.index==id_]
print(slice)
Output: 输出:
0 1
100 2.751189 1.978744
0 1
101 0.154483 1.646657
101 1.381725 0.982819
0 1
102 0.26669 0.032702
0 1
103 0.186235 -0.481184
You can force the variable slice to be a pandas dataframe by using the pd.Dataframe init method as follows: 您可以使用pd.Dataframe init方法强制变量slice成为pandas数据帧,如下所示:
for id_ in idx:
slice = pd.DataFrame(df.loc[id_])
print(type(slice))
output: 输出:
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
Then you can treat the variables as Dataframes inside the loop. 然后,您可以将变量视为循环内的Dataframes。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.