从数据框中提取特定的行

Question

I have a data frame df1 with two columns 'ids' and 'names' - 我有一个带有两列“ ids”和“ names”的数据框df1-

ids     names
fhj56   abc
ty67s   pqr
yu34o   xyz

I have another data frame df2 which has some of the columns being - 我有另一个数据框df2，其中某些列是-

user     values                       
1        ['fhj56','fg7uy8']
2        ['glao0','rt56yu','re23u']
3        ['fhj56','ty67s','hgjl09']

My result should give me those users from df2 whose values contains at least one of the ids from df1 and also tell which ids are responsible to put them into resultant table. 我的结果应该给我那些来自df2的用户，这些用户的值至少包含来自df1的ID之一，并告诉哪些ID负责将其放入结果表中。 Result should look like - 结果应类似于-

   user     values_responsible     names
   1        ['fhj56']              ['abc']
   3        ['fhj56','ty67s']      ['abc','pqr']

User 2 doesn't come in resultant table because none of its values exist in df1. 用户2不在结果表中，因为df1中不存在任何值。

I was trying to do it as follows - 我试图做到这一点如下-

df2.query('values in @df1.ids')

But this doesn't seem to work well. 但这似乎效果不佳。

Answer 1

You can iterate through the rows and then use .loc together with isin to find the matching rows from df2 . 您可以通过行迭代，然后使用.loc连同isin以找到匹配的行df2 。 I converted this filtered dataframe into a dictionary 我将此过滤后的数据框转换为字典

ids = []
names = []
users = []
for _, row in df2.iterrows():
    result = df1.loc[df1['ids'].isin(row['values'])]
    if not result.empty:
        ids.append(result['ids'].tolist())
        names.append(result['names'].tolist())
        users.append(row['user'])

>>> pd.DataFrame({'user': users, 'values_responsible': ids, 'names': names})[['user', 'values_responsible', 'names']]
   user values_responsible       names
0     1            [fhj56]       [abc]
1     3     [fhj56, ty67s]  [abc, pqr]

Or, for tidy data: 或者，对于整洁的数据：

ids = []
names = []
users = []
for _, row in df2.iterrows():
    result = df1.loc[df1['ids'].isin(row['values'])]
    if not result.empty:
        ids.extend(result['ids'].tolist())
        names.extend(result['names'].tolist())
        users.extend([row['user']] * len(result['ids']))

>>> pd.DataFrame({'user': users, 'values_responsible': ids, 'names': names})[['user', 'values_responsible', 'names']])
   user values_responsible names
0     1              fhj56   abc
1     3              fhj56   abc
2     3              ty67s   pqr

Answer 2

Try this , using the idea of unnest a list cell. 使用取消嵌套列表单元格的想法进行尝试。

Temp_unnest = pd.DataFrame([[i, x]
              for i, y in df['values'].apply(list).iteritems()
                  for x in y], columns=list('IV'))

Temp_unnest['user']=Temp_unnest.I.map(df.user)
df1.index=df1.ids
Temp_unnest.assign(names=Temp_unnest.V.map(df1.names)).dropna().groupby('user')['V','names'].agg({(lambda x: list(x))})


Out[942]: 
                   V       names
            <lambda>    <lambda>
user                            
1            [fhj56]       [abc]
3     [fhj56, ty67s]  [abc, pqr]

Answer 3

I would refactor your second dataframe (essentially, normalizing your database). 我将重构您的第二个数据框（实质上是对数据库进行规范化）。 Something like 就像是

user     gid     id                       
1        1       'fhj56'
1        1       'fg7uy8'
2        1       'glao0'
2        1       'rt56yu'
2        1       're23u'
3        1       'fhj56'
3        1       'ty67s'
3        1       'hgjl09'

Then, all you have to do is merge the first and second dataframe on the id column. 然后，您要做的就是将id列上的第一个和第二个数据帧合并。

r = df2.merge(df1, left_on='id', right_on='ids', how='left')

You can exclude any gids for which some of the ids don't have a matching name. 您可以排除某些ID不具有匹配名称的所有ID。

r[~r[gid].isin(  r[r['names'] == None][gid].unique()  )]

where r[r['names'] == None][gid].unique() finds all the gids that have no name and then r[~r[gid].isin( ... )] grabs only entries that aren't in the list argument for isin . 其中r[r['names'] == None][gid].unique()查找所有没有名字的小女孩，然后r[~r[gid].isin( ... )]只r[~r[gid].isin( ... )] isin list参数中。

If you had more id groups, the second table might look like 如果您有更多的ID组，第二个表可能看起来像

user     gid     id                       
1        1       'fhj56'
1        1       'fg7uy8'
1        2       '1asdf3'
1        2       '7ada2a'
1        2       'asd341'
2        1       'glao0'
2        1       'rt56yu'
2        1       're23u'
3        1       'fhj56'
3        1       'ty67s'
3        1       'hgjl09'

which would be equivalent to 相当于

user     values                       
1        ['fhj56','fg7uy8']
1        ['1asdf3', '7ada2a', 'asd341']
2        ['glao0','rt56yu','re23u']
3        ['fhj56','ty67s','hgjl09']

从数据框中提取特定的行

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-08-07 18:24:09

解决方案2
2 2017-08-07 19:25:30

解决方案3
1 2017-08-07 18:09:19

从数据框中提取特定的行

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-08-07 18:24:09

解决方案2 2 2017-08-07 19:25:30

解决方案3 1 2017-08-07 18:09:19

解决方案1
2 已采纳 2017-08-07 18:24:09

解决方案2
2 2017-08-07 19:25:30

解决方案3
1 2017-08-07 18:09:19