简体   繁体   中英

Writing check that all pandas DataFrame column values meet a certain values?

I'm writing checks for a software package. The elements within a pandas DataFrame meet certain conditions; if they don't, I will raise a ValueError exception written within Python.

Here is an example pandas DataFrame:

import pandas as pd
import numpy as np

dict1 = {'file': ['filename2', 'filename2', 'filename3', 'filename4', 
         'filename4', 'filename3'], 'amount': [3, 4, 5, 1, 2, 1], 
         'front': [21889611, 36357723, 196312, 11, 42, 1992], 
         'back':[21973805, 36403870, 277500, 19, 120, 3210], 
         'type':['A', 'A', 'A', 'B', 'B', 'C']}

df1 = pd.DataFrame(dict1)
print(df1)

        file  amount     front      back type
0  filename2       3  21889611  21973805    A
1  filename2       4  36357723  36403870    A
2  filename3       5    196312    277500    A
3  filename4       1        11        19    B
4  filename4       2        42       120    B
5  filename3       1      1992      3210    C

The most efficient way I've seen to check that certain values is to use sets , eg if column type contains elements that are not A , B , or C , throw an error:

if not set(['A', 'B', 'C']).issubset(df1['type']):
    raise ValueError('Pandas DataFrame contains improper values in "type" column')

Question:

How would I most efficiently check conditionals? eg I would like to check that column amount contains integers greater than 0. If there are any zeros, negative integers, or non-integers in this column, raise a ValueError() .

您可以只过滤一列,获取返回的数据帧的长度并在您的 if 语句中使用它:

len(df1[df1['amount'] > 0])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM