简体   繁体   中英

Extracting specific rows from a data frame

I have a data frame df1 with two columns 'ids' and 'names' -

ids     names
fhj56   abc
ty67s   pqr
yu34o   xyz

I have another data frame df2 which has some of the columns being -

user     values                       
1        ['fhj56','fg7uy8']
2        ['glao0','rt56yu','re23u']
3        ['fhj56','ty67s','hgjl09']

My result should give me those users from df2 whose values contains at least one of the ids from df1 and also tell which ids are responsible to put them into resultant table. Result should look like -

   user     values_responsible     names
   1        ['fhj56']              ['abc']
   3        ['fhj56','ty67s']      ['abc','pqr']

User 2 doesn't come in resultant table because none of its values exist in df1.

I was trying to do it as follows -

df2.query('values in @df1.ids')

But this doesn't seem to work well.

You can iterate through the rows and then use .loc together with isin to find the matching rows from df2 . I converted this filtered dataframe into a dictionary

ids = []
names = []
users = []
for _, row in df2.iterrows():
    result = df1.loc[df1['ids'].isin(row['values'])]
    if not result.empty:
        ids.append(result['ids'].tolist())
        names.append(result['names'].tolist())
        users.append(row['user'])

>>> pd.DataFrame({'user': users, 'values_responsible': ids, 'names': names})[['user', 'values_responsible', 'names']]
   user values_responsible       names
0     1            [fhj56]       [abc]
1     3     [fhj56, ty67s]  [abc, pqr]

Or, for tidy data:

ids = []
names = []
users = []
for _, row in df2.iterrows():
    result = df1.loc[df1['ids'].isin(row['values'])]
    if not result.empty:
        ids.extend(result['ids'].tolist())
        names.extend(result['names'].tolist())
        users.extend([row['user']] * len(result['ids']))

>>> pd.DataFrame({'user': users, 'values_responsible': ids, 'names': names})[['user', 'values_responsible', 'names']])
   user values_responsible names
0     1              fhj56   abc
1     3              fhj56   abc
2     3              ty67s   pqr

Try this , using the idea of unnest a list cell.

Temp_unnest = pd.DataFrame([[i, x]
              for i, y in df['values'].apply(list).iteritems()
                  for x in y], columns=list('IV'))

Temp_unnest['user']=Temp_unnest.I.map(df.user)
df1.index=df1.ids
Temp_unnest.assign(names=Temp_unnest.V.map(df1.names)).dropna().groupby('user')['V','names'].agg({(lambda x: list(x))})


Out[942]: 
                   V       names
            <lambda>    <lambda>
user                            
1            [fhj56]       [abc]
3     [fhj56, ty67s]  [abc, pqr]

I would refactor your second dataframe (essentially, normalizing your database). Something like

user     gid     id                       
1        1       'fhj56'
1        1       'fg7uy8'
2        1       'glao0'
2        1       'rt56yu'
2        1       're23u'
3        1       'fhj56'
3        1       'ty67s'
3        1       'hgjl09'

Then, all you have to do is merge the first and second dataframe on the id column.

r = df2.merge(df1, left_on='id', right_on='ids', how='left')

You can exclude any gids for which some of the ids don't have a matching name.

r[~r[gid].isin(  r[r['names'] == None][gid].unique()  )]

where r[r['names'] == None][gid].unique() finds all the gids that have no name and then r[~r[gid].isin( ... )] grabs only entries that aren't in the list argument for isin .


If you had more id groups, the second table might look like

user     gid     id                       
1        1       'fhj56'
1        1       'fg7uy8'
1        2       '1asdf3'
1        2       '7ada2a'
1        2       'asd341'
2        1       'glao0'
2        1       'rt56yu'
2        1       're23u'
3        1       'fhj56'
3        1       'ty67s'
3        1       'hgjl09'

which would be equivalent to

user     values                       
1        ['fhj56','fg7uy8']
1        ['1asdf3', '7ada2a', 'asd341']
2        ['glao0','rt56yu','re23u']
3        ['fhj56','ty67s','hgjl09']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM