Pandas - 根据其他列对列进行分组并将其标记为新列

Question

I have a data frame which I want to group based on the value of another column in the same data frame. 我有一个数据框，我想根据同一数据框中另一列的值进行分组。

For example: 例如：

The Parent_ID and Child ID are linked and defines who is related to who in a hierarchical tree. Parent_ID和子ID是链接的，用于定义与层次树中的人员相关的人员。

The dataframe looks like (input from a csv file) 数据框看起来像（从csv文件输入）

No  Name    ID  Parent_Id
1   Tom     211 111
2   Galie   209 111
3   Remo    200 101
4   Carmen  212 121
5   Alfred  111 191
6   Marvela 101 111
7   Armin   234 101
8   Boris   454 109
9   Katya   109 323

I would like to group this data frame based on the ID and Parent_ID in the below grouping, and generate CSV files out of this based on the top level parent. 我想根据以下分组中的ID和Parent_ID对此数据框进行分组，并根据顶级父级生成CSV文件。 Ie, Alfred.csv, Carmen.csv (will have only its own entry, ice line #4) , Katya.csv using the to_csv() function. 即，Alfred.csv，Carmen.csv（将只有自己的条目，冰线＃4），Katya.csv使用to_csv（）函数。

Alfred
  |_ Galie
   _ Tom
   _ Marvela
       |_ Remo
        _ Armin
Carmen
Katya
  |_ Boris

And, I want to create a new column in the same data frame, that will have a tag indicating the hierarchy. 而且，我想在同一个数据框中创建一个新列，它将有一个标记指示层次结构。 Like: 喜欢：

No  Name    ID  Parent_Id   Tag
1   Tom     211 111     Alfred
2   Galie   209 111     Alfred
3   Remo    200 101     Marvela, Alfred
4   Carmen  212 121 
5   Alfred  111 191 
6   Marvela 101 111     Alfred
7   Armin   234 101     Marvela, Alfred
8   Boris   454 109     Katya
9   Katya   109 323

Note that the names can repeat, but the ID will be unique. 请注意，名称可以重复，但ID将是唯一的。

Kindly let me know how to achieve this using pandas. 请告诉我如何使用熊猫实现这一目标。 I tried out groupby() but seems a little complicated and not getting what I intend. 我尝试了groupby（），但似乎有点复杂，没有得到我想要的。 There should be one file for each parent, and the child records in the parent file. 每个父级应该有一个文件，子级记录在父文件中。 If a child has other child (like marvel), it qualifies to have its own csv file. 如果孩子有其他孩子（如奇迹），它有资格拥有自己的csv文件。

And the final output would be 而最终的输出将是

Alfred.csv - All records matching Galie, Tom, Marvela
Marvela.csv - All records matching Remo, Armin
Carmen.csv - Only record matching carmen (row)
Katya.csv - all records matching katya, boris

Answer 1

I am assuming your dataframe as a dictionary: 我假设你的数据帧是一个字典：

mydf = ({"No":[1,2,3,4,5,6,7,8,9],"Name":["Tom","Galie","Remo","Carmen","Alfred","Marvela","Armin","Boris","Katya"],
        "ID":[211,209,200,212,111,101,234,454,109],"Parent_Id":[111,111,101,121,191,111,101,109,323]})
df = pd.DataFrame(mydf)

Then, I identify the Parent_Id from each row. 然后，我从每一行中识别出Parent_Id 。 Finally stored them into new column: 最后将它们存储到新列中：

tag = []
for z in df['Parent_Id']:
    try:
        tag.append(df.query('ID==%s'%z)['Name'].item())
    except:
        tag.append('')
df['Tag'] = tag

To filter the dataframe based on a value in column Tag , eg Alfred : 要根据列Tag的值过滤数据框，例如Alfred ：

df[df['Tag'].str.match('Alfred')]

Then save it in a csv file. 然后将其保存在csv文件中。 Repeat for other values. 重复其他值。 Alternatively, if you have a large number of names in column Tag , then use for loop. 或者，如果列Tag有大量名称，则使用for循环。

Pandas - 根据其他列对列进行分组并将其标记为新列

问题描述

1 个解决方案

解决方案1
0 2019-04-06 03:52:44

Pandas - 根据其他列对列进行分组并将其标记为新列

问题描述

1 个解决方案

解决方案1 0 2019-04-06 03:52:44

解决方案1
0 2019-04-06 03:52:44