简体   繁体   English

遍历 dataframe 列并在 python pandas 中创建新列

[英]Iterating over dataframe column and creating new column in python pandas

I have a dataframe like this DataFrame我有一个 dataframe 像这样DataFrame

在此处输入图像描述

I want to create a new column(seq) and do the following things我想创建一个新列(seq)并执行以下操作

  • df['seq'] = First item of every group(by id) is 0 df['seq'] = 每个组的第一项(按 id)为 0
  • df['seq'] = Keep Increment until you see a Date in the date column by group(by id) df['seq'] = 保持增量,直到您按组(按 id)在日期列中看到日期
  • if date is present reset the column seq to 0 again follow the same increment process如果存在日期,则再次将列 seq 重置为 0 遵循相同的增量过程

This is my expected answer expected output:这是我预期的 output 的预期答案:

在此处输入图像描述

Thanks谢谢

The easiest solution it comes to me is to create an array and fill it with the value of a counter when you loop over your dataframe.对我来说最简单的解决方案是创建一个数组并在循环遍历 dataframe 时用counter的值填充它。 You will add the new column after.您将在之后添加新列。

For example:例如:

seq = np.zeros(len(df))
date = ''
id = 0
counter = 0
for i in range(len(df)):
    test_date = df['Dates'].iloc[i]
    test_id = df['id'].iloc[i]
    if (test_date and test_date != date) or (test_id != id):
        # new date or id detected
        date = test_date
        id = test_id
        counter = 0
    else:
        counter += 1
    seq[i] = counter

df['seq'] = seq

It might be a more efficient way to do that, but from my experience, performances are ok when using numpy array to add data in dataframe.这可能是一种更有效的方法,但根据我的经验,使用 numpy 数组在 dataframe 中添加数据时性能还可以。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM