简体   繁体   English

从不同行的列中为 pandas DataFrame 列分配值的最佳方法是什么?

[英]What is the optimal way to assign a value to a pandas DataFrame column from a column in a different row?

I need to iterate over a DataFrame indexed by UNIX timestamp, and in one column, assign a value from another column in a different row at a specific index time in the future.我需要遍历由 UNIX 时间戳索引的 DataFrame,并在一个列中,在未来特定索引时间从不同行的另一列中分配一个值。 This is what I'm currently doing:这就是我目前正在做的事情:

df = pd.DataFrame([
    [1523937600, 100.0, 0.0], 
    [1523937660, 120.0, 0.0], 
    [1523937720, 110.0, 0.0],
    [1523937780, 90.0, 0.0],
    [1523937840, 99.0, 0.0]], 
    columns=['time', 'value', 'target'])
df.set_index('time', inplace=True)

skip = 2  # mins skip-ahead
for i in range(0, df.shape[0]-1):       
    t = df.index[i] + (60*skip)
    try:
        df.iloc[i].target = df.loc[t].value
    except KeyError:
        df.iloc[i].target = 0.0

Output: Output:

            value  target
time                     
1523937600  100.0   110.0
1523937660  120.0    90.0
1523937720  110.0    99.0
1523937780   90.0     0.0
1523937840   99.0     0.0

This works, but I am dealing with datasets containing millions of rows and it takes an extremely long time.这行得通,但我正在处理包含数百万行的数据集,并且需要很长时间。 Is there a more optimal way to do this?有没有更优化的方法来做到这一点?

EDIT: Added example input/output.编辑:添加了示例输入/输出。 Note, it is important that I obtain the value from the row with the calculated index time rather than just look ahead n rows, as there could be gaps in the times, or additional times in between.请注意,重要的是我从具有计算索引时间的行中获取值,而不是仅仅向前看 n 行,因为时间之间可能存在间隙,或者两者之间可能存在额外的时间。

In this case you should keep time as a column as well as index.在这种情况下,您应该将时间作为一列以及索引。 Hope this helps:希望这可以帮助:

df = pd.DataFrame([ 
    [1523937600, 100.0, 0.0], 
    [1523937660, 120.0, 0.0], 
    [1523937720, 110.0, 0.0],
    [1523937780, 90.0, 0.0],
    [1523937840, 99.0, 0.0]], 
    columns=['time', 'value', 'target'])
df.index = df['time']

df['target'] = df['time'].apply(lambda x: df.loc[x+(skip*60)].value if x+(skip*60) in df.index.values  else 0.0)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一行的条件在 Pandas dataframe 中创建新列的最佳方法是什么? - What is the optimal way to create a new column in Pandas dataframe based on conditions from another row? Pandas 根据上一行不同的列值分配列值 - Pandas assign a column value basis previous row different column value 使用列名和行索引从 pandas dataframe 中选择值的正确方法是什么? - What is the correct way of selecting value from pandas dataframe using column name and row index? 如何根据不同的条件为 pandas dataframe 中的特定列赋值? - How to assign value to particular column in pandas dataframe based on different conditions? 当该列被排序时,是否有更好的方法按列值过滤 Pandas dataframe? - Is there a more optimal way of filtering Pandas dataframe by a column value, when that column is ordered? 如何为数据框中的每一行分配一个值到不同的列? - How can I assign a value to a different column for each row in a dataframe? (行、列):值到 Pandas DataFrame - (Row, Column) : Value to Pandas DataFrame 有没有办法将先前计算的行值与 Pandas Dataframe 中不同列的总和一起使用? - Is there a way to use the previous calculated row value with the sum of a different column in a Pandas Dataframe? 通过对每一行进行操作,在数据框中创建列的“pandas”方法是什么? - What is the `pandas` way to create a column in a dataframe by operating on each row? 根据同一pandas数据框中的其他列为列分配值 - Assign value to a column based of other columns from the same pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM