简体   繁体   English

Pandas 保持原厂 DataFrame dtypes

[英]Pandas keep original DataFrame dtypes

I have a DataFrame similar to this one:我有一个类似于这个的 DataFrame:

             id  pose       score
437209   842134     1        -6.5
437210   842134     2        -6.3
437211   842134     3        -6.3
437212   842134     4        -6.1
437213   842134     5        -6.1
437214   842134     6        -5.5
437215   842134     7        -5.4
437216   842134     8        -5.2
437217   842134     9        -5.2
437218   842134    10        -5.1
19435    842135     1        -7.0
19436    842135     2        -6.8

I want to create another DataFrame from the top 1 scores for each id .我想从每个id的前 1 个分数中创建另一个 DataFrame 。 However, when I group the values by their id , their dtypes change, so my df2 looks like this:但是,当我按它们的id对值进行分组时,它们的 dtypes 会发生变化,所以我的df2看起来像这样:

df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')])
print(df2.head(2))

              id  pose       score
437209  842134.0   1.0        -6.5
19435   842135.0   1.0        -7.0

Get dtype:获取数据类型:

for i, args in df[:20].groupby('id'):
    print(args.iloc[0])

id            842134.0
pose               1.0
score             -6.5
Name: 437209, dtype: float64
id            842135.0
pose               1.0
score             -7.0
Name: 19435, dtype: float64

You can see id and pose are not int anymore, which will compromise the rest of my code, since I will use these values for indexing.您可以看到idpose不再是int了,这将损害我的代码的 rest,因为我将使用这些值进行索引。 Here are a couple things I've tried:以下是我尝试过的几件事:

df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')], dtype=df.dtypes)

TypeError: dtype 'id              int64
pose            int64
vina_score    float64
dtype: object' not understood
df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')], dtype=df.dtypes.to_dict())

ValueError: entry not a 2- or 3- tuple
df2 = pd.DataFrame([args.iloc[0] for _,args in df.groupby('id')], dtype=df.dtypes.tolist())

TypeError: data type not understood

Any help would be appreciated.任何帮助,将不胜感激。 Edit: df is ordered by score for each id (the lower score the better), the first pose does not necessarily have the best score.编辑:df按每个idscore排序(分数越低越好),第一个姿势不一定有最好的分数。

That is when you pass a Series to dataframe constructor, we should do concat也就是说,当您将 Series 传递给 dataframe 构造函数时,我们应该执行concat

pd.concat([args.iloc[0] for _,args in df.groupby('id')])
            id  pose  score
437209  842134     1   -6.5
19435   842135     1   -7.0

Also, we have head and drop_duplicates另外,我们有headdrop_duplicates

 df.groupby('id').head(1)

 df.drop_duplicates('id')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM