[英]Pandas keep original DataFrame dtypes
I have a DataFrame similar to this one:我有一个类似于这个的 DataFrame:
id pose score
437209 842134 1 -6.5
437210 842134 2 -6.3
437211 842134 3 -6.3
437212 842134 4 -6.1
437213 842134 5 -6.1
437214 842134 6 -5.5
437215 842134 7 -5.4
437216 842134 8 -5.2
437217 842134 9 -5.2
437218 842134 10 -5.1
19435 842135 1 -7.0
19436 842135 2 -6.8
I want to create another DataFrame from the top 1 scores for each id
.我想从每个
id
的前 1 个分数中创建另一个 DataFrame 。 However, when I group the values by their id
, their dtypes change, so my df2
looks like this:但是,当我按它们的
id
对值进行分组时,它们的 dtypes 会发生变化,所以我的df2
看起来像这样:
df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')])
print(df2.head(2))
id pose score
437209 842134.0 1.0 -6.5
19435 842135.0 1.0 -7.0
Get dtype:获取数据类型:
for i, args in df[:20].groupby('id'):
print(args.iloc[0])
id 842134.0
pose 1.0
score -6.5
Name: 437209, dtype: float64
id 842135.0
pose 1.0
score -7.0
Name: 19435, dtype: float64
You can see id
and pose
are not int
anymore, which will compromise the rest of my code, since I will use these values for indexing.您可以看到
id
和pose
不再是int
了,这将损害我的代码的 rest,因为我将使用这些值进行索引。 Here are a couple things I've tried:以下是我尝试过的几件事:
df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')], dtype=df.dtypes)
TypeError: dtype 'id int64
pose int64
vina_score float64
dtype: object' not understood
df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')], dtype=df.dtypes.to_dict())
ValueError: entry not a 2- or 3- tuple
df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')], dtype=df.dtypes.tolist())
TypeError: data type not understood
Any help would be appreciated.任何帮助,将不胜感激。 Edit: df is ordered by
score
for each id
(the lower score the better), the first pose does not necessarily have the best score.编辑:df按每个
id
的score
排序(分数越低越好),第一个姿势不一定有最好的分数。
That is when you pass a Series to dataframe constructor, we should do concat
也就是说,当您将 Series 传递给 dataframe 构造函数时,我们应该执行
concat
pd.concat([args.iloc[0] for _,args in df.groupby('id')])
id pose score
437209 842134 1 -6.5
19435 842135 1 -7.0
Also, we have head
and drop_duplicates
另外,我们有
head
和drop_duplicates
df.groupby('id').head(1)
df.drop_duplicates('id')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.