使用列表过滤熊猫数据框

Question

I am trying to filter using a list of user_id's and a mask. 我正在尝试使用user_id和掩码的列表进行过滤。 Here is the input with two user_id's: 这是带有两个user_id的输入：

data = np.array([['user_id','comment','label'],
            [100,'First comment',0],
            [101,'Buy viagra',1],
            [100,'Buy viagra two',1],
            [101,'Third comment',0],
            [100,'Third comment two',0],
            [101,'Buy drugs',1],
            [100,'Buy drugs two',1],
            [101,'Buy icecream',1],
            [100,'Buy icecream two',1],
            [101,'Buy something',1],
            [100,'Buy something two',1]])

The desired output is: 所需的输出是：

0      100      First comment     0
1      101         Buy viagra     1
2      100     Buy viagra two     1
3      101      Third comment     0
4      100  Third comment two     0
5      101          Buy drugs     1
6      100      Buy drugs two     1
7      101       Buy icecream     1
8      100   Buy icecream two     1

By passing a list of user_id's, I am getting an incorrect output. 通过传递user_id的列表，我得到了错误的输出。

m = df.user_id.isin([100,101]) & df.label.eq('1')
i = df[m].head(3)
j = df[~m]
df = pd.concat([i, j]).sort_index()
print (df)

However, if I pass just one user_id as below, I get correct output. 但是，如果我只传递一个user_id，如下所示，我将获得正确的输出。 Can you please suggest me what's wrong? 你能建议我怎么了吗？ Thanks. 谢谢。

m = df.user_id.eq('101') & df.label.eq('1')

Answer 1

There is problem your values are strings in user_id column, so need ['100','101'] instead [100, 101] : 有一个问题，您的值是user_id列中的字符串，因此需要['100','101']而不是[100, 101] ：

df = pd.DataFrame(data[1:], columns=data[0])

m = df.user_id.isin(['100','101']) & df.label.eq('1')
i = df[m]
print (i)
   user_id            comment label
1      101         Buy viagra     1
2      100     Buy viagra two     1
5      101          Buy drugs     1
6      100      Buy drugs two     1
7      101       Buy icecream     1
8      100   Buy icecream two     1
9      101      Buy something     1
10     100  Buy something two     1

You can check type s in one column by: 您可以通过以下方法在一列中检查type s：

print (df.user_id.apply(type))

0     <class 'str'>
1     <class 'str'>
2     <class 'str'>
3     <class 'str'>
4     <class 'str'>
5     <class 'str'>
6     <class 'str'>
7     <class 'str'>
8     <class 'str'>
9     <class 'str'>
10    <class 'str'>
Name: user_id, dtype: object

And if need check all columns: 如果需要，请检查所有列：

print (df.applymap(type))

          user_id        comment          label
0   <class 'str'>  <class 'str'>  <class 'str'>
1   <class 'str'>  <class 'str'>  <class 'str'>
2   <class 'str'>  <class 'str'>  <class 'str'>
3   <class 'str'>  <class 'str'>  <class 'str'>
4   <class 'str'>  <class 'str'>  <class 'str'>
5   <class 'str'>  <class 'str'>  <class 'str'>
6   <class 'str'>  <class 'str'>  <class 'str'>
7   <class 'str'>  <class 'str'>  <class 'str'>
8   <class 'str'>  <class 'str'>  <class 'str'>
9   <class 'str'>  <class 'str'>  <class 'str'>
10  <class 'str'>  <class 'str'>  <class 'str'>

使用列表过滤熊猫数据框

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-01-12 07:01:49

使用列表过滤熊猫数据框

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-01-12 07:01:49

解决方案1
4 已采纳 2018-01-12 07:01:49