简体   繁体   English

如何在特定条件下过滤pandas数据框中的列值?

[英]How to filter column values in the pandas dataframe with certain conditions?

I created a Pandas dataframe and would like to filter for some values. 我创建了一个Pandas数据框,并希望过滤一些值。 The data frame contains 4 columns namely currency port supplier_id value and I would like to have values that would entertain that conditions provided below, 数据框包含4列,即currency port supplier_id value ,我希望具有可以满足以下条件的值,

* port – expressed as a portcode, a 5-letter string uniquely identifying a port. Portcodes consist of 2-letter country code and 3-letter city code.
* supplier_id - integer, uniquely identifying the provider of the information
* currency - 3-letter string identifying the currency
* value - a floating-point number

df =  df[ (len(df['port']) == 5 & isinstance(df['port'], basestring)) & \
  isinstance(df['supplier_id'], int) & \
  (len(df['currency']) == 3 & isinstance(df['currency'], basestring))\
  isinstance(df['value'], float) ]

The snippet of the code should be obvious and trying to implement the conditions mentioned earlier but it doesn't work. 该代码段应该很明显,并且试图实现前面提到的条件,但是它不起作用。 The print from the df is provided below, 下面提供了df的打印件,

     currency   port  supplier_id   value
0         CNY  CNAQG         35.0   820.0
1         CNY  CNAQG         19.0   835.0
2         CNY  CNAQG         49.0   600.0
3         CNY  CNAQG         54.0   775.0
4         CNY  CNAQG        113.0   785.0
5         CNY  CNAQG          5.0   790.0
6         CNY  CNAQG         55.0   770.0
7         CNY  CNAQG         81.0   810.0
8         CNY  CNAQG          2.0   770.0
9         CNY  CNAQG         10.0   825.0


print df[df.supplier_id.isnull()] # prints below 
Empty DataFrame
Columns: [currency, port, supplier_id, value]
Index: []



df.info() # prints below     
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6661 entries, 0 to 6660
Data columns (total 4 columns):
currency       6661 non-null object
port           6661 non-null object
supplier_id    6661 non-null float64
value          6661 non-null float64
dtypes: float64(2), object(2)
memory usage: 208.2+ KB
None

How to write it properly? 如何正确编写?

You can use if have mixed values in one column - numeric with strings: 如果一列中包含混合值,则可以使用-带字符串的数字:

df = pd.DataFrame({'port':['aa789',2,3],
                   'supplier_id':[4,'s',6],
                   'currency':['USD',8,9],
                   'value':[1.7,3,5]})

print (df)
  currency   port supplier_id  value
0      USD  aa789           4    1.7
1        8      2           s    3.0
2        9      3           6    5.0

#for python 2 change str to basestring
m1 = (df.port.astype(str).str.len() == 5) & (df.port.apply(lambda x :isinstance(x, str)))
m2 = df.supplier_id.apply(lambda x : isinstance(x, int))
m3=(df.currency.astype(str).str.len() == 3)&(df.currency.apply(lambda x :isinstance(x, str)))
m4 = df.value.apply(lambda x : isinstance(x, float))
mask = m1 & m2 & m3 & m4
print (mask)
0     True
1    False
2    False
dtype: bool

print (df[mask])
  currency   port supplier_id  value
0      USD  aa789           4    1.7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM