Python：根据给定索引，使用现有df中的行创建新的数据框

Question

I have a Dataframe and I need to create a new one where by when a row has the same element in a certain column as another row, the row where the second occurrence is should be moved directly under the row containing the first occurrence. 我有一个数据框，我需要创建一个新的数据框，当某行的某个列中的元素与另一行的元素相同时，第二个出现的行应直接移到包含第一个出现的行的下面。 I'm afraid this might be hard to explain but hopefully with examples it is clearer. 恐怕这可能很难解释，但希望通过示例可以使它更清楚。

I have a df such as this: (The important column is 'Direction') 我有这样的df ：（重要的一栏是“方向”）

    Node  |  Feature | Indicator | Value | Class | Direction
    --------------------------------------------------------
    1     |  WPS     |     <=    | 0.27  | 4     | 1 -> 2  
    --------------------------------------------------------
    2     |  ABC     |     <=    | 0.40  | 5     | 2 -> 3
    --------------------------------------------------------
    3     |  CXC     |     <=    | 0.45  | 2     | 3 -> 4
    --------------------------------------------------------
    4     |  WPS     |     <=    | 0.56  | 1     | 1 -> 5
    --------------------------------------------------------
    5     |  ABC     |     <=    | 0.30  | 3     | 2 -> 5
   --------------------------------------------------------
    6     |  CXC     |     <=    | 0.55  | 5     | 3 -> 1

When the first number in direction occurs twice (in the case of nodes (1 & 4), (2 & 5) and (3 & 6), I would like the row with the second occurrence (node 4, 5 and 6) to be moved directly below the other row. 当方向上的第一个数字出现两次时（对于节点（1＆4），（2＆5）和（3＆6）），我希望第二次出现的行（节点4、5和6）直接移到另一行下方。

I need the result to look like this: 我需要结果看起来像这样：

    Node  |  Feature | Indicator | Value | Class | Direction
    --------------------------------------------------------
    1     |  WPS     |     <=    | 0.27  | 4     | 1 -> 2  
    --------------------------------------------------------
    4     |  WPS     |     <=    | 0.56  | 1     | 1 -> 5
    --------------------------------------------------------
    2     |  ABC     |     <=    | 0.40  | 5     | 2 -> 3
    --------------------------------------------------------
    5     |  ABC     |     <=    | 0.30  | 3     | 2 -> 5
    --------------------------------------------------------
    3     |  CXC     |     <=    | 0.45  | 3     | 3 -> 4
    --------------------------------------------------------
    6     |  CXC     |     <=    | 0.55  | 5     | 3 -> 1

I have spent so long trying to come up with a solution so I would be so grateful if anyone is able to help. 我花了很长时间尝试提出解决方案，所以如果有人能够提供帮助，我将非常感激。

What I am trying to do at the moment: 我目前正在尝试做的是：

Create a list containing the first integers from the ['Direction'] col: first_Ints_ls = [1, 2, 3, 1, 2, 3] 创建一个包含[Direction]行中第一个整数的列表：first_Ints_ls = [1、2、3、1、2、3]

I then try to find the indices of the first and second occurrence within the first_Ints_ls, which I hoped to use to access the rows of the Dataframe by the indices. 然后，我尝试在first_Ints_ls中找到第一个和第二个出现的索引，我希望该索引可用于通过索引访问Dataframe的行。

first_ind_ls = []
second_ind_ls = []

    for i in firstInt_ls:
        # Find the indexes of the first and second occurance
        first_ind = firstInt_ls.index(i, 0)
        second_ind = firstInt_ls.index(i, first_ind+1)
        first_ind_ls.append(first_ind)
        second_ind_ls.append(second_ind)

This produces: 这将产生：

print(first_ind_ls)
>> [1, 2, 3, 1, 2, 3]
print(second_ind_ls)
>> [4, 5, 6]

I remove any duplicates from first_ind_ls so that both lists are the same size. 我从first_ind_ls删除所有重复项，以便两个列表的大小相同。

# Resulting lists:
>> [1, 2, 3]
>> [4, 5, 6]

Now I wanted to iterate through my Dataframe and take the row at the first index in first_ind_ls (which is 1) and add to a new data frame, then take the row which is at the first index of second_ind_ls (which is 4) and add that to the new data frame. 现在，我想遍历我的数据帧，并在first_ind_ls（为1）的第一个索引处获取行，然后添加到新的数据帧，然后在second_ind_ls（即4）的第一个索引处获取行，然后添加到新的数据框。 And continue until I end up with a Dataframe as above. 并继续直到我得到上面的数据框。

What I have already tried is not working at all so I won't bother posting the code unless requested. 我已经尝试过的方法根本无法正常工作，因此除非有要求，否则我不会打扰发布代码。

I'm really having trouble figuring out how I can loop through my df and access the rows whilst at the same time looping through both lists containing the indices, then adding rows at each index to a new df... 我真的很难弄清楚如何循环遍历df并访问行，而同时循环遍历两个包含索引的列表，然后将每个索引处的行添加到新的df ...

I just don't know what else to do so if anyone has any advice I'd be most appreciative. 如果有人有任何建议，我将不胜感激，我只是不知道该怎么办。 I am quite new to programming so I guess my way of looking at the problem may be wrong 我是编程的新手，所以我想我看问题的方式可能是错误的

Answer 1

If I understand right the only key to the sorting is the first element in the Direction column. 如果我理解正确，排序的唯一关键是“ Direction列中的第一个元素。 I assume Direction is of type string . 我认为Direction是string类型的。 So see if this actually very simple naive method works for you. 因此，看看这种实际上非常简单的天真的方法是否对您有用。

Create a key column (not absolutely needed but for clarification) 创建一个关键列（并非绝对必要，但需要澄清）

df['key'] = df['Direction'].apply(lambda x: x.split()[0])

Then sort values on this key 然后在此key上对值进行排序

df.sort_values('key')

Does this work ? 这行得通吗？ Or am I missing something ? 还是我错过了什么？

Python：根据给定索引，使用现有df中的行创建新的数据框

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-05-23 11:49:20

Python：根据给定索引，使用现有df中的行创建新的数据框

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-05-23 11:49:20

解决方案1
1 已采纳 2019-05-23 11:49:20