简体   繁体   English

使用 pandas 有条件地在 dataframe 中间插入行

[英]Conditionally insert rows in the middle of dataframe using pandas

I have a dataset that I need to add rows based on conditions.我有一个数据集,我需要根据条件添加行。 Rows can be added anywhere within the dataset.行可以添加到数据集中的任何位置。 ie, middle, top, and bottom.即,中间,顶部和底部。

I have 26 columns in the data but will only use a few to set conditions.我在数据中有 26 列,但只会使用一些来设置条件。 I want my code to go through each row and check if a column named "potveg" has values 4,8 or 9. If true, add a row below it and set 'col,' 'lat' column values similar to those of the last row, and set the values of columns 'icohort' and 'isrccohort' to those of the last row + 1. Then export the new data frame to CSV.我希望我的代码通过每一行 go 并检查名为“potveg”的列的值是否为 4,8 或 9。如果为真,请在其下方添加一行并设置类似于最后一行,并将列'icohort'和'isrccohort'的值设置为最后一行的值+1。然后将新数据框导出到CSV。 I have tried several implementations based on this logic: Pandas: Conditionally insert rows into DataFrame while iterating through rows in the middle PS* New to Python and Pandas I have tried several implementations based on this logic: Pandas: Conditionally insert rows into DataFrame while iterating through rows in the middle PS* New to Python and Pandas

Here is the code I have so far:这是我到目前为止的代码:

   for index, row in df.iterrows():
    last_row = df.iloc[index-1]
    next_row = df.iloc[index]

    new_row = {
'col':last_row.col,
'row':last_row.row,
'tmpvarname':last_row.tmpvarname,
'year':last_row.year,
'icohort':next_row.icohort,
'isrccohort':next_row.icohort,
'standage':3000,
'chrtarea':0,
'potveg':13,
'currentveg':13,
'subtype':13,
'agstate':0,
'agprevstate':0,
'tillflag':0,
'fertflag':0,
'irrgflag':0,
'disturbflag':0,
'disturbmonth':0,
'FRI':2000,
'slashpar':0,
'vconvert':0,
'prod10par':0,
'prod100par':0,
'vrespar':0,
'sconvert':0,
'tmpregion':last_row.tmpregion
    }
new_row = {k:v for k,v in new_row.items()}
if (df.iloc[index]['potveg'] == 4):
              newdata =df.append(new_row, ignore_index=True)

Following the steps you suggested, you could write something like:按照您建议的步骤,您可以编写如下内容:

df = pd.DataFrame({'id':[1,2,4,5], 'before': [1,2,4,5], 'after': [1,2,4,5]})
new_df = pd.DataFrame()

for i, row in df.iterrows():
    new_df = pd.concat([new_df, pd.DataFrame(row.to_frame().transpose())])
    if row['id'] == 2:
        # add our new row, with data for `col` before coming from the previous row, and `after` coming from the following row
        temp = pd.DataFrame({'id': [3], 'before': [df.loc[i]['before']], 'after': [df.loc[i+1]['after']]})
        new_df = pd.concat([new_df, pd.DataFrame(temp)])

You might need to consider exploring how you could approach the problem without iterating over the dataframe as this might be quite slow if you have a large dataset.您可能需要考虑探索如何在不迭代 dataframe 的情况下解决问题,因为如果您有一个大型数据集,这可能会很慢。 I'd suggest checking the apply function.我建议检查应用 function。

Inserting rows at a specific position can be done this way:在特定的 position 插入行可以这样完成:

import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 4, 5], 'col2': ['A', 'B', 'D', 'E']})

new_row = pd.DataFrame({'col1': [3], 'col2': ['C']})
idx_pos = 2

pd.concat([df.iloc[:idx_pos], new_row, df.iloc[idx_pos:]]).reset_index(drop=True)

Output: Output:

   col1 col2
0     1    A
1     2    B
2     3    C
3     4    D
4     5    E

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM