简体   繁体   中英

Pandas Dataframe: for a given row, trying to assign value in a certain column based on a lookup of a value in another column

Basically for a given row i, I am trying to assign i's value in the column 'Adj', to a certain value based on i's value in another column 'Local Max String'. Basically row i's value in 'Local Max String' needs to be searched up in another column of the DataFrame, 'Date String', and then the row that contains the value, row q, has it's value in the column 'Adj Close' be the value for row i's 'Adj' column.

Sorry if that is difficult to understand. The following for loop accomplished what I wanted to do, but I think there should be a better way to do it in Pandas. I tried using apply and lambda functions, but it said assignment wasn't possible, and I'm unsure if the way I was doing it was correct. The for loop also takes extremely long to complete.

Here's the code:

for x in range(0, len(df.index)):
    df['Adj'][x] = df.loc[df['Date String'] == df['Local Max String'][x]]['Adj Close']

Here's a picture of the DF to get a better idea of what I mean. The value in the Adj column will look for the Adj Close value corresponding to the Date in Local Max String.

在此处输入图片说明

import numpy as np
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
import datetime
import fix_yahoo_finance as yf
yf.pdr_override() # <== that's all it takes :-)

# Dates for data
start_date = datetime.datetime(2017,11,1)
end_date = datetime.datetime(2018,11,1)

df = pdr.get_data_yahoo('SPY', start=start_date, end=end_date)

df.data = df['Adj Close']
df['Most Recent Local Max'] = np.nan
df['Date'] = df.index
local_maxes = list(df[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)].index)
local_maxes.append(df['Date'][0] - datetime.timedelta(days=1))

def nearest(items, pivot):
    return min([d for d in items if d< pivot], key=lambda x: abs(x - pivot))

df['Most Recent Local Max'] = df['Date'].apply(lambda x: min([d for d in local_maxes if d < x], key=lambda y: abs(y - x)) )

df['Local Max String'] = df['Most Recent Local Max'].apply(lambda x: str(x))

df['Date String'] = df['Date'].apply(lambda x: str(x))

df.loc[df['Local Max String'] == str(df['Date'][0] - datetime.timedelta(days=1)), 'Local Max String'] = str(df['Date'][0])

df['Adj'] = np.nan

Thanks!

This solution still has a for, but it reduces the amount of iterations from df.shape[1] to df['Local Max String'].nunique() , so it may be fast enough:

for a_local_max in df['Local Max String'].unique():
    df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]

Often you can skip the for loop by using apply-like function in pandas . Hereafter, I define a wrapper function which combines variables row-wisely. Finally this function is applied on the data frame to create the result variable. The key element here is to think on the row level within the wrapper function and indicate this behaviour to the apply function with the axis=1 argument.

import pandas as pd
import numpy as np

# Dummy data containing two columns with overlapping data
df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})

# Here you define the operation at the row level
def wrapper(row):
#     uncomment if the transformation is to be applied to every column:
#     return 2*row['date']
#     if you need to first test some condition:
    if row['string'] == row['another_string']:
        return 2*row['date']
    else:
        return 0

# Finally you generate the new column using the operation defined above.
df['result'] = df.apply(wrapper, axis=1)

This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM