简体   繁体   中英

Compare values in multiple columns and add a new value in another column in Python

I have a house rent price data as follows:

import pandas as pd
import numpy as np
data = {
    "HouseName": ["A", "A", "B", "B", "B"],
    "Type": ["OneRoom", "TwoRooms", "OneRoom", "TwoRooms", "ThreeRooms"],
    "Jan_S": [1100, 1776, 1228, 1640, np.NaN],
    "Feb_S": [1000, 1805, 1231, 1425, 1800],
    "Mar_S": [1033, 1748, 1315, 1591, 2900],
    "Jan_L": [1005, np.NaN, 1300, np.NaN, 7000]
}
df = pd.DataFrame.from_dict(data)
print(df)

  HouseName        Type   Jan_S  Feb_S  Mar_S   Jan_L 
0         A     OneRoom  1100.0   1000   1033  1005.0 
1         A    TwoRooms  1776.0   1805   1748     NaN 
2         B     OneRoom  1228.0   1231   1315  1300.0 
3         B    TwoRooms  1640.0   1425   1591     NaN 
4         B  ThreeRooms     NaN   1800   2900  7000.0 

I need to realize two things: first, I want to find a reasonable rent price for January based on columns 'Jan_S', 'Feb_S', 'Mar_S', 'Jan_L'. Here S and L mean two different data sources, both of them may have outliers and nans but data from S will be taken as final price for January at priority. Second, For the same HouseName I need to check and make sure that the price of one room is lower than two rooms, and prices of two rooms is lower than three rooms. My final results will look like this:

HouseName        Type    Jan_S    Feb_S  Mar_S   Jan_L  
0         A     OneRoom  1100.0   1000   1033  1005.0     
1         A    TwoRooms  1776.0   1805   1748     NaN     
2         B     OneRoom  1228.0   1231   1315  1300.0   
3         B    TwoRooms  1640.0   1425   1591     NaN   
4         B  ThreeRooms     NaN   1800   2900  7000.0    

      Result(Jan)  
0         1100  
1         1776  
2         1228  
3         1640  
4         1800  

My idea is check if Jan_S is in range of 0.95 and 1.05 of Jan_L, if yes, take Jan_S as final result, otherwise, continue to check a value from Feb_S as Jan_S.

Please share any ideas that you might have to deal with this problem in Python. Thanks! Here are some references which may helps.

Find nearest value from multiple columns and add to a new column in Python

Compare values under multiple conditions of one column in Python

Check if values in one column is in interval values of another column in Python

You can use fillna for this.

If you want to have a conditional on selection of columns, then you need to figure the logic to filter the columns to pick the values from.

I'm showing the logic using the min() of all price columns

# filter out the price columns
price_cols = df.columns[~df.columns.isin(['HouseName','Type', 'Jan_S'])]

# then figure out the logic to filter the columns you need and use fillna
# here with the min of all columns as example
df['Jan_S'] = df['Jan_S'].fillna(df[price_cols].apply(min, axis=1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM