Pandas 保持原厂 DataFrame dtypes

Question

I have a DataFrame similar to this one:我有一个类似于这个的 DataFrame：

             id  pose       score
437209   842134     1        -6.5
437210   842134     2        -6.3
437211   842134     3        -6.3
437212   842134     4        -6.1
437213   842134     5        -6.1
437214   842134     6        -5.5
437215   842134     7        -5.4
437216   842134     8        -5.2
437217   842134     9        -5.2
437218   842134    10        -5.1
19435    842135     1        -7.0
19436    842135     2        -6.8

I want to create another DataFrame from the top 1 scores for each id .我想从每个id的前 1 个分数中创建另一个 DataFrame 。 However, when I group the values by their id , their dtypes change, so my df2 looks like this:但是，当我按它们的id对值进行分组时，它们的 dtypes 会发生变化，所以我的df2看起来像这样：

df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')])
print(df2.head(2))

              id  pose       score
437209  842134.0   1.0        -6.5
19435   842135.0   1.0        -7.0

Get dtype:获取数据类型：

for i, args in df[:20].groupby('id'):
    print(args.iloc[0])

id            842134.0
pose               1.0
score             -6.5
Name: 437209, dtype: float64
id            842135.0
pose               1.0
score             -7.0
Name: 19435, dtype: float64

You can see id and pose are not int anymore, which will compromise the rest of my code, since I will use these values for indexing.您可以看到id和pose不再是int了，这将损害我的代码的 rest，因为我将使用这些值进行索引。 Here are a couple things I've tried:以下是我尝试过的几件事：

df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')], dtype=df.dtypes)

TypeError: dtype 'id              int64
pose            int64
vina_score    float64
dtype: object' not understood

df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')], dtype=df.dtypes.to_dict())

ValueError: entry not a 2- or 3- tuple

df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')], dtype=df.dtypes.tolist())

TypeError: data type not understood

Any help would be appreciated.任何帮助，将不胜感激。 Edit: df is ordered by score for each id (the lower score the better), the first pose does not necessarily have the best score.编辑：df按每个id的score排序（分数越低越好），第一个姿势不一定有最好的分数。

Answer 1

That is when you pass a Series to dataframe constructor, we should do concat也就是说，当您将 Series 传递给 dataframe 构造函数时，我们应该执行concat

pd.concat([args.iloc[0] for _,args in df.groupby('id')])
            id  pose  score
437209  842134     1   -6.5
19435   842135     1   -7.0

Also, we have head and drop_duplicates另外，我们有head和drop_duplicates

 df.groupby('id').head(1)

 df.drop_duplicates('id')

Pandas 保持原厂 DataFrame dtypes

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-05-16 23:22:03

Pandas 保持原厂 DataFrame dtypes

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-05-16 23:22:03

解决方案1
0 已采纳 2020-05-16 23:22:03