简体   繁体   中英

How to compare between two dataframes pandas

I have a df1 like below and i want check if all the values of certain column in df2 are between df1 max and min value. If it is i want to give value from name column of that index. If df2 value is not in between any of those, i want to see if it is bigger or smaller than any of df1 max or min value.

data = {'Name':  ['MN1', 'MN2', 'MN3', 'MN4', 'MN5', 'MN6', 'MN7-8', 'MN9', 'MN10', 'MN11', 'MN12', 'MN13', 'MN14', 'MN15', 'MN16','MN17', 'MQ18', 'MQ19'],
        'MAX': [23, 21.7, 19.5, 17.2, 16.4, 14.2, 12.85, 11.2, 9.9, 8.9, 7.6, 7.1, 5.3, 5, 3.55, 2.5, 1.9, 0.85],
         'MIN':[21.7, 19.5, 17.2, 16.4, 14.2, 12.85, 11.2, 9.9, 8.9, 7.6, 7.1, 5.3, 5, 3.55, 2.5, 1.9, 0.85, 0.01]
        }
df1 = pd.DataFrame (data, columns = ['Name','MAX','MIN'])

I tried this:


list = []

for i in df2['AVERAGE_AGE']:
    for index, row in df1.iterrows():
        if row['MAX'] >= i and row['MIN'] < i:
            list.append(row['Name'])
    
    if i > df1['MAX'].max():
        list.append("Postmn")
    elif i < df1['MIN'].min():
        list.append("Premn")
    
df2['MNname'] = list

this takes long time and list length doesn't match with length of df2

You can try this

(df2['AVERAGE_AGE'] < df1['MIN'].min()).value_counts()
(df2['AVERAGE_AGE'] > df1['MAX'].max()).value_counts()

This will tell you the number of rows that satisfy the conditions by giving the counts of True and False.

You can loop over the first dataframe and set Names for the second using pandas.DataFrame.loc :

>>> df2 = pd.DataFrame([
...   2.299367, 20.688943, 10.245027, 1.412258, 22.541987,
...   2.588420, 5.578598, 11.703629, 12.529066, 17.769196,
...   ], columns=['AVERAGE_AGE'])
>>> for index, row in df1.iterrows():
...   df2.loc[(df2.AVERAGE_AGE>=row.MIN) & (df2.AVERAGE_AGE<row.MAX),'Name'] = row.Name
... 
>>> df2
   AVERAGE_AGE   Name
0     2.299367   MN17
1    20.688943    MN2
2    10.245027    MN9
3     1.412258   MQ18
4    22.541987    MN1
5     2.588420   MN16
6     5.578598   MN13
7    11.703629  MN7-8
8    12.529066  MN7-8
9    17.769196    MN3

Try this:

arr = []
for i in range(df2.shape[0]):    
    # Check if the value in COLUMN_1 is between MIN and MAX value
    if ((df2['COLUMN_1'][i] > df1['MIN'][i]) and df2['COLUMN_1'][i] < df1['MAX'][i]):
        arr.append(df1['Name'][i])
    # Check if value in COLUMN_1 is less than Minimum value
    elif (df2['COLUMN_1'][i] < df1['MIN'][i]):
        arr.append(np.round(df2['COLUMN_1'][i] - df1['MIN'][i], 2))
    # Check if value in COLUMN_1 is less than Minimum value
    elif (df2['COLUMN_1'][i] > df1['MAX'][i]):
        arr.append(np.round(df2['COLUMN_1'][i] - df1['MAX'][i], 2))

df2['Name'] = pd.Series(arr)

As you have not mentioned exactly the name of column to be checked in df2, I have used it as COLUMN_1. The conditions and values used are:

  1. If the value in COLUMN_1 is between MIN and MAX then get data corresponding to df1['Name']
  2. If the value in COLUMN_1 is less than MIN then do (value in COLUMN_1 - MIN) giving negative value
  3. If the value in COLUMN_1 is greater than MAX then do (value in COLUMN_1 - MAX) giving positive value

Hope this works!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM