简体   繁体   English

Python:根据给定索引,使用现有df中的行创建新的数据框

[英]Python: Create a new data frame using rows from existing df depending on a given index

I have a Dataframe and I need to create a new one where by when a row has the same element in a certain column as another row, the row where the second occurrence is should be moved directly under the row containing the first occurrence. 我有一个数据框,我需要创建一个新的数据框,当某行的某个列中的元素与另一行的元素相同时,第二个出现的行应直接移到包含第一个出现的行的下面。 I'm afraid this might be hard to explain but hopefully with examples it is clearer. 恐怕这可能很难解释,但希望通过示例可以使它更清楚。

I have a df such as this: (The important column is 'Direction') 我有这样的df :(重要的一栏是“方向”)

    Node  |  Feature | Indicator | Value | Class | Direction
    --------------------------------------------------------
    1     |  WPS     |     <=    | 0.27  | 4     | 1 -> 2  
    --------------------------------------------------------
    2     |  ABC     |     <=    | 0.40  | 5     | 2 -> 3
    --------------------------------------------------------
    3     |  CXC     |     <=    | 0.45  | 2     | 3 -> 4
    --------------------------------------------------------
    4     |  WPS     |     <=    | 0.56  | 1     | 1 -> 5
    --------------------------------------------------------
    5     |  ABC     |     <=    | 0.30  | 3     | 2 -> 5
   --------------------------------------------------------
    6     |  CXC     |     <=    | 0.55  | 5     | 3 -> 1

When the first number in direction occurs twice (in the case of nodes (1 & 4), (2 & 5) and (3 & 6), I would like the row with the second occurrence (node 4, 5 and 6) to be moved directly below the other row. 当方向上的第一个数字出现两次时(对于节点(1&4),(2&5)和(3&6)),我希望第二次出现的行(节点4、5和6)直接移到另一行下方。

I need the result to look like this: 我需要结果看起来像这样:

    Node  |  Feature | Indicator | Value | Class | Direction
    --------------------------------------------------------
    1     |  WPS     |     <=    | 0.27  | 4     | 1 -> 2  
    --------------------------------------------------------
    4     |  WPS     |     <=    | 0.56  | 1     | 1 -> 5
    --------------------------------------------------------
    2     |  ABC     |     <=    | 0.40  | 5     | 2 -> 3
    --------------------------------------------------------
    5     |  ABC     |     <=    | 0.30  | 3     | 2 -> 5
    --------------------------------------------------------
    3     |  CXC     |     <=    | 0.45  | 3     | 3 -> 4
    --------------------------------------------------------
    6     |  CXC     |     <=    | 0.55  | 5     | 3 -> 1

I have spent so long trying to come up with a solution so I would be so grateful if anyone is able to help. 我花了很长时间尝试提出解决方案,所以如果有人能够提供帮助,我将非常感激。

What I am trying to do at the moment: 我目前正在尝试做的是:

Create a list containing the first integers from the ['Direction'] col: first_Ints_ls = [1, 2, 3, 1, 2, 3] 创建一个包含[Direction]行中第一个整数的列表:first_Ints_ls = [1、2、3、1、2、3]

I then try to find the indices of the first and second occurrence within the first_Ints_ls, which I hoped to use to access the rows of the Dataframe by the indices. 然后,我尝试在first_Ints_ls中找到第一个和第二个出现的索引,我希望该索引可用于通过索引访问Dataframe的行。

first_ind_ls = []
second_ind_ls = []

    for i in firstInt_ls:
        # Find the indexes of the first and second occurance
        first_ind = firstInt_ls.index(i, 0)
        second_ind = firstInt_ls.index(i, first_ind+1)
        first_ind_ls.append(first_ind)
        second_ind_ls.append(second_ind)

This produces: 这将产生:

print(first_ind_ls)
>> [1, 2, 3, 1, 2, 3]
print(second_ind_ls)
>> [4, 5, 6]

I remove any duplicates from first_ind_ls so that both lists are the same size. 我从first_ind_ls删除所有重复项,以便两个列表的大小相同。

# Resulting lists:
>> [1, 2, 3]
>> [4, 5, 6]

Now I wanted to iterate through my Dataframe and take the row at the first index in first_ind_ls (which is 1) and add to a new data frame, then take the row which is at the first index of second_ind_ls (which is 4) and add that to the new data frame. 现在,我想遍历我的数据帧,并在first_ind_ls(为1)的第一个索引处获取行,然后添加到新的数据帧,然后在second_ind_ls(即4)的第一个索引处获取行,然后添加到新的数据框。 And continue until I end up with a Dataframe as above. 并继续直到我得到上面的数据框。

What I have already tried is not working at all so I won't bother posting the code unless requested. 我已经尝试过的方法根本无法正常工作,因此除非有要求,否则我不会打扰发布代码。

I'm really having trouble figuring out how I can loop through my df and access the rows whilst at the same time looping through both lists containing the indices, then adding rows at each index to a new df... 我真的很难弄清楚如何循环遍历df并访问行,而同时循环遍历两个包含索引的列表,然后将每个索引处的行添加到新的df ...

I just don't know what else to do so if anyone has any advice I'd be most appreciative. 如果有人有任何建议,我将不胜感激,我只是不知道该怎么办。 I am quite new to programming so I guess my way of looking at the problem may be wrong 我是编程的新手,所以我想我看问题的方式可能是错误的

If I understand right the only key to the sorting is the first element in the Direction column. 如果我理解正确,排序的唯一关键是“ Direction列中的第一个元素。 I assume Direction is of type string . 我认为Directionstring类型的。 So see if this actually very simple naive method works for you. 因此,看看这种实际上非常简单的天真的方法是否对您有用。

Create a key column (not absolutely needed but for clarification) 创建一个关键列(并非绝对必要,但需要澄清)

df['key'] = df['Direction'].apply(lambda x: x.split()[0])

Then sort values on this key 然后在此key上对值进行排序

df.sort_values('key')

Does this work ? 这行得通吗? Or am I missing something ? 还是我错过了什么?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM