简体   繁体   中英

Adding a column to a pandas dataframe based on other columns

Problem description

Introductory remark: For the code have a look below

Let's say we have a pandas dataframe consisting of 3 columns and 2 rows. I'd like to add a 4th column called 'Max_LF' that will consist of an array. The value of the cell is retrieved by having a look at the column 'Max_WD'. For the first row that would be 0.35 which will than be compared to the values in the column 'WD' where 0.35 can be found at the third position. Therefore, the third value of the column 'LF' should be written into the column 'Max_LF'. If the value of 'Max_WD' occures multiple times in 'WD', then all corresponding items of 'LF' should be written into 'Max_LF'.

Failed attempt

So far I had various attemps on first retrieving the index of the item in 'Max_WD' in 'WD'. After potentially retrieving the index the idea was to then get the items of 'LF' via their index: df4['temp_indices'] = [i for i, x in enumerate(df4['WD']) if x == df4['Max_WD']]

However, a ValueError occured: raise ValueError('Lengths must match to compare') ValueError: Lengths must match to compare

This is what the example dateframe looks like

df = pd.DataFrame(data={'LF': [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]], 'WD': [[0.28, 0.34, 0.35, 0.18], [0.42, 0.45, 0.45, 0.18], [0.31, 0.21, 0.41, 0.41]], 'Max_WD': [0.35, 0.45, 0.41]})

The expected outcome should look like

df=pd.DataFrame(data={'LF': [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]], 'WD': [[0.28, 0.34, 0.35, 0.18], [0.42, 0.45, 0.45, 0.18], [0.31, 0.21, 0.41, 0.41]], 'Max_WD': [0.35, 0.45, 0.41], 'Max_LF': [[3],[2,3], [3,4]]})

You could get it by simply using lambda as follows

df['Max_LF'] = df.apply(lambda x : [i + 1 for i, e in enumerate(x['WD']) if e == x['Max_WD']], axis=1)

output is

    LF  Max_WD  WD  Max_LF
0   [1, 2, 3]   0.35    [0.28, 0.34, 0.35, 0.18]    [3]
1   [1, 2, 3]   0.45    [0.42, 0.45, 0.45, 0.18]    [2, 3]
2   [1, 2, 3]   0.41    [0.31, 0.21, 0.41, 0.41]    [3, 4]

Thanks guys. With your help I was able to solve my problem. Like Prince Francis suggested I first did

df['temp'] = df.apply(lambda x: [i for i, e in enumerate(x['WD']) if e == x['Max_WD']], axis=1)

to get the indicees of the 'WD'-values in 'LF'. In a second stept I then could add the actual column 'Max_LF' by doing

df['LF_Max'] = df.apply(lambda x: [x['LF'][e] for e in (x['temp'])],axis=1)

Thanks a lot guys!

You can achieve it by applying a function over axis 1 .

For this, I recommend you to first convert the WD list into a pd.Series (or a numpy.ndarray) and then compare all the values at once.

Assuming that you want a list of all the values higher than the threshold, you could use this:

>>> def get_max_wd(x): 
...     wd = pd.Series(x.WD) 
...     return list(wd[wd >= x.Max_WD])
... 
>>> df.apply(get_max_wd, axis=1)
0          [0.35]
1    [0.45, 0.45]
2    [0.41, 0.41]
dtype: object

The result of the apply can then be assigned as a new column into the dataframe:

df['Max_LF'] = df.apply(get_max_wd, axis=1)

If what you are after is only the maximum value (see my comment above), you can use the max() method within the function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM