简体   繁体   中英

Creating a list from data frame of value greater than a specific value

I have a question on how to create a list of values that are greater than a specific value in a given data frame variable.

       a.     b.     c.
1.    100     57     23   
2.     99     56     23
3.    100     56     22
4.    101     57     23
...
300.   99     50     23 
301.   99     51     29
302.  101     57     22

Create a list of all values where a > 100.

I am able to index, but not a list since all the values are boolean:

Greater_100 = df['a']>100

How do I turn this into a list?

df = pd.DataFrame(np.random.randint(0, 200, (10, 3)), columns=list('abc'))
list_a_more_than_hundred = df[df.a>100]

Only df[df['a'] > 100].loc[:, 'a'] or df[df['a'] > 100].loc[:, 'a'].tolist() is sufficient.

Selecting the rows from column a where value is > 100.

>>> df[df['a'] > 100].loc[:, 'a']
4      101
302    101
Name: a, dtype: int64
>>>
>>> type(df[df['a'] > 100].loc[:, 'a'])
<class 'pandas.core.series.Series'>

Converting the above Series into list.

>>> l = df[df['a'] > 100].loc[:, 'a'].tolist()
>>> l
[101, 101]
>>>
>>> type(l)
<class 'list'>
>>>

Let's look at the above code in more detail.

>>> import numpy as np
>>> import pandas as pd
>>>
>>> arr = [[100, 57, 23], [99, 56, 23],
... [100, 56, 20], [101, 57, 23], [99, 50, 23],
... [99, 51, 29], [101, 57, 22]]
>>>
>>> columns = [ch for ch in 'abc']
>>> indices = [str(n) for n in [1, 2, 3, 4, 300, 301, 302]]
>>>
>>> df = pd.DataFrame(arr, index=indices, columns=columns)
>>> df
     a   b   c
1    100  57  23
2     99  56  23
3    100  56  20
4    101  57  23
300   99  50  23
301   99  51  29
302  101  57  22
>>>
>>> df['a'] > 100
1      False
2      False
3      False
4       True
300    False
301    False
302     True
Name: a, dtype: bool
>>>
>>> arr2 = df.loc[:,'a']
>>> arr2
1      100
2       99
3      100
4      101
300     99
301     99
302    101
Name: a, dtype: int64
>>>
>>> arr2 = df[df['a'] > 100]
>>> arr2
     a   b   c
4    101  57  23
302  101  57  22
>>>
>>> arr3 = df[df['a'] > 100].loc[:, 'a']
>>> arr3
4      101
302    101
Name: a, dtype: int64
>>>
>>> l = arr3.tolist()
>>> l
[101, 101]
>>>

To filter your dataframe for rows where a > 100 , you can use pd.DataFrame.query :

res_df = df.query('a > 100')

This also works for multiple conditions:

res_df = df.query('a > 100 & b < 57')

If you wish to extract a list of values from these rows, you can use use NumPy, eg

res_lst = df.query('a > 100 & b < 57').values.ravel().tolist()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM