简体   繁体   English

Pandas Dataframe:对于给定的行,尝试基于在另一列中查找值来分配特定列中的值

[英]Pandas Dataframe: for a given row, trying to assign value in a certain column based on a lookup of a value in another column

Basically for a given row i, I am trying to assign i's value in the column 'Adj', to a certain value based on i's value in another column 'Local Max String'. 基本上,对于给定的行i,我试图根据另一列“本地最大字符串”中i的值,将“ Adj”列中的i值分配给某个值。 Basically row i's value in 'Local Max String' needs to be searched up in another column of the DataFrame, 'Date String', and then the row that contains the value, row q, has it's value in the column 'Adj Close' be the value for row i's 'Adj' column. 基本上,需要在DataFrame的另一列“日期字符串”中搜索“本地最大字符串”中的第i行的值,然后,包含值q的行在“调整结束”列中具有该值。第i行的“ Adj”列的值。

Sorry if that is difficult to understand. 抱歉,如果很难理解。 The following for loop accomplished what I wanted to do, but I think there should be a better way to do it in Pandas. 下面的for循环完成了我想做的事情,但是我认为在Pandas中应该有更好的方法。 I tried using apply and lambda functions, but it said assignment wasn't possible, and I'm unsure if the way I was doing it was correct. 我尝试使用apply和lambda函数,但是它说不可能进行赋值,而且我不确定我的操作方式是否正确。 The for loop also takes extremely long to complete. for循环还需要花费很长时间才能完成。

Here's the code: 这是代码:

for x in range(0, len(df.index)):
    df['Adj'][x] = df.loc[df['Date String'] == df['Local Max String'][x]]['Adj Close']

Here's a picture of the DF to get a better idea of what I mean. 这是DF的图片,可以更好地理解我的意思。 The value in the Adj column will look for the Adj Close value corresponding to the Date in Local Max String. “调整”列中的值将查找与“本地最大字符串”中的“日期”相对应的“调整结束”值。

在此处输入图片说明

import numpy as np
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
import datetime
import fix_yahoo_finance as yf
yf.pdr_override() # <== that's all it takes :-)

# Dates for data
start_date = datetime.datetime(2017,11,1)
end_date = datetime.datetime(2018,11,1)

df = pdr.get_data_yahoo('SPY', start=start_date, end=end_date)

df.data = df['Adj Close']
df['Most Recent Local Max'] = np.nan
df['Date'] = df.index
local_maxes = list(df[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)].index)
local_maxes.append(df['Date'][0] - datetime.timedelta(days=1))

def nearest(items, pivot):
    return min([d for d in items if d< pivot], key=lambda x: abs(x - pivot))

df['Most Recent Local Max'] = df['Date'].apply(lambda x: min([d for d in local_maxes if d < x], key=lambda y: abs(y - x)) )

df['Local Max String'] = df['Most Recent Local Max'].apply(lambda x: str(x))

df['Date String'] = df['Date'].apply(lambda x: str(x))

df.loc[df['Local Max String'] == str(df['Date'][0] - datetime.timedelta(days=1)), 'Local Max String'] = str(df['Date'][0])

df['Adj'] = np.nan

Thanks! 谢谢!

This solution still has a for, but it reduces the amount of iterations from df.shape[1] to df['Local Max String'].nunique() , so it may be fast enough: 该解决方案仍然具有for,但是将迭代次数从df.shape[1]df['Local Max String'].nunique() ,因此它可能足够快:

for a_local_max in df['Local Max String'].unique():
    df.loc[df['Date String'] == a_local_max, 'Adj'] = df.loc[df['Local Max String'] == a_local_max, 'Adj Close'].iloc[0]

Often you can skip the for loop by using apply-like function in pandas . 通常,您可以在pandas使用类似于apply的函数来跳过for循环。 Hereafter, I define a wrapper function which combines variables row-wisely. 此后,我定义了一个wrapper函数,该函数按行组合变量。 Finally this function is applied on the data frame to create the result variable. 最后,将此函数应用于数据框以创建result变量。 The key element here is to think on the row level within the wrapper function and indicate this behaviour to the apply function with the axis=1 argument. 这里的关键元素是考虑wrapper函数内的行级别,并使用axis=1参数apply这种行为指示给apply函数。

import pandas as pd
import numpy as np

# Dummy data containing two columns with overlapping data
df = pd.DataFrame({'date': 100*np.random.sample(10000), 'string': 2500*['hello', 'world', '!', 'mars'], 'another_string': 10000*['hello']})

# Here you define the operation at the row level
def wrapper(row):
#     uncomment if the transformation is to be applied to every column:
#     return 2*row['date']
#     if you need to first test some condition:
    if row['string'] == row['another_string']:
        return 2*row['date']
    else:
        return 0

# Finally you generate the new column using the operation defined above.
df['result'] = df.apply(wrapper, axis=1)

This code completes in 195 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 该代码在每个循环195 ms±1.96 ms中完成(平均±标准偏差,共运行7次,每个循环1次)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫矢量化根据日期分配列值,给定另一个具有值和开始日期的数据框 - Pandas vectorization to assign column value based on date, given another dataframe with value and start date 如何根据给定列为行中出现的唯一值分配 pandas dataframe 中的数字 - how to assign the number in pandas dataframe for the unique value appearing in the row based on given column Label 基于另一列(同一行)的值的列 pandas dataframe - Label a column based on the value of another column (same row) in pandas dataframe 给定另一列包含特定值的 Pandas 行数 - Pandas row count given another column contains a certain value 根据另一个数据框中的数据将值分配给pandas列 - assign value to pandas column based on data in another dataframe 熊猫:根据另一行中的等效值分配值并进行查找 - pandas: assign value based on equivalent value in another row with lookup Pandas数据框根据查询数据框中的值选择行,然后根据列值选择其他条件 - Pandas Dataframe Select rows based on values from a lookup dataframe and then another condition based on column value 我如何根据列单元格值和 append 查找一个 dataframe 上的一行到另一个 dataframe 上的一行? - How do i lookup a row on one dataframe based on the column cell value and append that to a row on another dataframe? Pandas:在 dataframe 中创建列,并通过查看另一个 dataframe 为该列分配值 - Pandas: Create column in dataframe and assign value to the column by looking into another dataframe 根据列值删除Pandas中的DataFrame行 - Deleting DataFrame row in Pandas based on column value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM