简体   繁体   English

python - 我如何在 dataframe 中执行以下操作

[英]python - how do I perform the below operation in dataframe

I had dataframe a part of which is shown below我有 dataframe 其中一部分如下所示

df1 = pd.DataFrame({
                   'id1': ['676PA','676RA','197PA','197RA','199PA','199RA','834PA','834RA','950PA','950RA','952PA','952RA','953PA','953RA','954PA','954RA','956PA','956RA'],
                    'Category1' : ['P-L','FL123','P-L','FL123','P-L','FL123','P-L','FL123','FL123','P-L','P-L','FL123','FL123','P-L','FL123','P-L','P-L','FL123'],
                    'Val1' : [-4.0,39.0,-8.0,45.0,-4.0,27.0,-46.0,271.0,-70.0,3.0,-34.0,192.0,-56.0,3.0,-56.0,3.0,-52.0,292.0]
})

each id has two entries "PA" and "RA" with corresponding Category and Val as shown below每个 id 都有两个条目“PA”和“RA”以及相应的类别和 Val,如下所示

the sequencing of Category is incorrect and I want it to be consistent for all ids. Category 的排序不正确,我希望所有 ID 都保持一致。 I want PL to come first followed by FL123 for each of the ids.我希望 PL 先出现,然后是每个 ID 的 FL123。 I have shown the "current" and how I want to see the output below.我已经显示了“当前”以及我希望如何查看下面的 output。 Any help is much appreciated.任何帮助深表感谢。

在此处输入图像描述

Edit - keep historical order of IDs编辑 - 保留 ID 的历史顺序

Since you want to keep the order of you IDs (so you want to maintain 676, 199 etc.) you need to create some sort of count that will count each ID with the same number (ie 676R AND 676PA both equal 0, the next ID for both RA and PA will equal 1 and so on...).因为你想保持你的 ID 的顺序(所以你想保持 676、199 等)你需要创建某种计数来计算每个具有相同数字的 ID(即 676R 和 676PA 都等于 0,下一个RA 和 PA 的 ID 都等于 1,依此类推...)。

What you can do is very similar, you just need a different temporary column:您可以做的非常相似,您只需要一个不同的临时列:

  1. Create a new temp column using .groupby() and .cumcount() where you group on the Category1 so that it counts each category one after the other.使用.groupby().cumcount()创建一个新的临时列,您在其中对Category1进行分组,以便它一个接一个地计算每个类别。
  2. Sort on this new new ID column and the Category1 column:)对这个新的ID 列和Category1列进行排序:)
(
    df1
    .assign(temp = df1.groupby('Category1').cumcount())
    .sort_values(['temp','Category1'], ascending=[True, False])
)

Note: This will only work if you have two categories for each ID注意:这仅适用于每个 ID 有两个类别的情况

Original - If order retention isn't needed原始 - 如果不需要保留订单

Like said in the comments, if it does not matter in which order the IDs appear (ie 197PA, 197RA can come first (but together) in your entire list) then you can use a sort.就像评论中所说的那样,如果 ID 的显示顺序无关紧要(即 197PA、197RA 可以在整个列表中排在第一位(但一起)),那么您可以使用排序。 However, since you need to sort using the digits only, you need to do this in two steps (optional third):但是,由于您只需要使用数字进行排序,因此您需要分两步执行此操作(可选的第三步):

  1. Create a new column (say temp_id ) where you replace 'PA' to be 'RA' (or vice-versa)创建一个新列(比如temp_id ),将“PA”替换为“RA”(反之亦然)
  2. Sort by new_id ascending and Category1 descendingnew_id升序Category1降序排序
  3. Optional: Remove the temp_id column as you no longer need it.可选:删除temp_id列,因为您不再需要它。
(
    df1.assign(temp_id = df1['id1'].str[:-2]) 
    .sort_values(['temp_id','Category1'], ascending=[True, False])
#     .drop('temp_id', axis=1)
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM