[英]How to filter column values in the pandas dataframe with certain conditions?
I created a Pandas dataframe and would like to filter for some values. 我创建了一个Pandas数据框,并希望过滤一些值。 The data frame contains 4 columns namely
currency port supplier_id value
and I would like to have values that would entertain that conditions provided below, 数据框包含4列,即
currency port supplier_id value
,我希望具有可以满足以下条件的值,
* port – expressed as a portcode, a 5-letter string uniquely identifying a port. Portcodes consist of 2-letter country code and 3-letter city code.
* supplier_id - integer, uniquely identifying the provider of the information
* currency - 3-letter string identifying the currency
* value - a floating-point number
df = df[ (len(df['port']) == 5 & isinstance(df['port'], basestring)) & \
isinstance(df['supplier_id'], int) & \
(len(df['currency']) == 3 & isinstance(df['currency'], basestring))\
isinstance(df['value'], float) ]
The snippet of the code should be obvious and trying to implement the conditions mentioned earlier but it doesn't work. 该代码段应该很明显,并且试图实现前面提到的条件,但是它不起作用。 The print from the
df
is provided below, 下面提供了
df
的打印件,
currency port supplier_id value
0 CNY CNAQG 35.0 820.0
1 CNY CNAQG 19.0 835.0
2 CNY CNAQG 49.0 600.0
3 CNY CNAQG 54.0 775.0
4 CNY CNAQG 113.0 785.0
5 CNY CNAQG 5.0 790.0
6 CNY CNAQG 55.0 770.0
7 CNY CNAQG 81.0 810.0
8 CNY CNAQG 2.0 770.0
9 CNY CNAQG 10.0 825.0
print df[df.supplier_id.isnull()] # prints below
Empty DataFrame
Columns: [currency, port, supplier_id, value]
Index: []
df.info() # prints below
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6661 entries, 0 to 6660
Data columns (total 4 columns):
currency 6661 non-null object
port 6661 non-null object
supplier_id 6661 non-null float64
value 6661 non-null float64
dtypes: float64(2), object(2)
memory usage: 208.2+ KB
None
How to write it properly? 如何正确编写?
You can use if have mixed values in one column - numeric with strings: 如果一列中包含混合值,则可以使用-带字符串的数字:
df = pd.DataFrame({'port':['aa789',2,3],
'supplier_id':[4,'s',6],
'currency':['USD',8,9],
'value':[1.7,3,5]})
print (df)
currency port supplier_id value
0 USD aa789 4 1.7
1 8 2 s 3.0
2 9 3 6 5.0
#for python 2 change str to basestring
m1 = (df.port.astype(str).str.len() == 5) & (df.port.apply(lambda x :isinstance(x, str)))
m2 = df.supplier_id.apply(lambda x : isinstance(x, int))
m3=(df.currency.astype(str).str.len() == 3)&(df.currency.apply(lambda x :isinstance(x, str)))
m4 = df.value.apply(lambda x : isinstance(x, float))
mask = m1 & m2 & m3 & m4
print (mask)
0 True
1 False
2 False
dtype: bool
print (df[mask])
currency port supplier_id value
0 USD aa789 4 1.7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.