简体   繁体   中英

Pandas - Grab value x from column a if column y contains b

Hi there stack overflow community,

I'm trying to get a tree-hierarchy from the following dataframe:

  SP             VP     
0 --           king   
1 king         knight 1     
2 king         knight 2
3 knight 1     knight 3     

In column 'SP' I've the superior and in 'VP' the underlying knight. I've filtered the column 'SP' so far that I got the top node (king). Now i want to get all the knights in column 'VP' underlying to the king in 'SP', than the selected knights as new superiors and their underlying knights and so on...

import pandas as pd 
import numpy as np

#define input
df1=pd.read_excel(r'path').fillna("@Null$tring").sort_values(by=['VP'])
#df1=pd.read_excel(r'path').fillna("@Null$tring").sort_values(by=['VP'])

SP1 = df1['SP'].tolist()
VP1 = df1['VP'].tolist()

VP2 = df1.reindex(['VP'])
#identifies the top node
root = []

#Grabs values in SP not present in VP.
root = pd.Index(df1["SP"]).difference(pd.Index(df1["VP"]))

#assigns the values in SP to the position in VP
df2 = df1[df1["SP"].isin(root)][["SP", "VP"]].reset_index(drop = True)

root2 = df2['SP'].tolist()

#dropping ALL duplicate values
sps = df1.drop_duplicates(subset=['SP'])

#all available SPs
spss =sps['SP']


#check output
spss.to_excel(r'C:\Users\mallk\OneDrive\Desktop\Koenigswege\ProgrammingDell\Tree Structure\Test.xlsx')
#print(SP1)

df3 = df2['VP'].isin(df1['SP'])

df3 = df1[df1['VP'].isin(df2)][['SP', 'VP']].reset_index(drop = True)

df3 = df1.loc['SP'] = [df2['VP']]

print(df2['VP'])
print(df3)

My last attemps to match the superior(king) with the underlying knights were the following:

df3 = df2['VP'].isin(df1['SP'])

df3 = df1[df1['VP'].isin(df2)][['SP', 'VP']].reset_index(drop = True)

df4 = df1.loc['SP'] = [df2['VP']]

But none of them work, I'm stucked :/ Maybe someone has some inspiration for me. Thanks a lot!

excuse my English, with subordinate do you mean that the SP and VP columns belong to the same index?

If you want only certain rows to belong to the same index and others not, this is not possible. It must be the totalities of the columns that belong to the index.

For what purpose do you want them to be subordinated? maybe there are other ways to do it.

Anyway, to have two columns as an index is as easy as df.set_index(['SP', 'VP'], inplace=True)

You can always have another auxiliary dataframe with the rows that interest you filtered and add the index there (because it will be applied to the two columns in its entirety -without excluding rows-).

I am not sure what you want your desired output to look like. But here is maybe what you are looking to achieve. Grouping VP by SP and getting the group values.

df.groupby('SP').get_group('king')['VP'].values

Output:

array(['knight 1', 'knight 2'], dtype=object)

This one get the subordinates of king for example. You can further automate this if this is your desired output.

EDIT

Here is how to automate it and add it to a new column in the dataframe

dict_groups = {}
for x in df['SP'].unique():
   values = df.groupby('SP').get_group(x)['VP'].values
   dict_groups[x] = values

df['subordinate'] = df.SP.map(dict_groups)

Output

    SP  VP  subordinate
0   __  king    [king]
1   king    knight 1    [knight 1, knight 2]
2   king    knight 2    [knight 1, knight 2]
3   knight 1    knight 3    [knight 3, soldier]
4   knight 1    soldier [knight 3, soldier]
5   knight 2    soldier [soldier]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM