简体   繁体   中英

Iterate Over Rows in Pandas DataFrame Deleting All Values Within a Specified Number of Columns After a Specific String

As the title implies I would like to iterate over the rows of my dataframe shown below.

I have a specific string that occurs somewhere within every row of my dataframe. I would like to delete every value within every column of that row, up until a specific column, (in this case 'zz').

In every row, every value after the specific string ('a') should be deleted, up until 'zz'. I do not want to delete any row values in column 'zz' or furthermore any values after column 'zz', ie in column 'aa'.

afterString = 'a'

df = {
    'bb': ['a', 'z', 'y'],
    'vv': ['b', 'a', 'z'],
    'ww': ['c', 'b', 'a'],
    'xx': ['d', 'c', 'b'],
    'yy': ['e', 'd', 'c'],
    'zz': ['f', 'e', 'd'],
    'aa': ['g', 'f', 'e']

}
output = {
    'bb': ['a', 'z', 'y'],
    'vv': ['', 'a', 'z'],
    'ww': ['', '', 'a'],
    'xx': ['', '', ''],
    'yy': ['', '', ''],
    'zz': ['f', 'e', 'd']
    'aa': ['g', 'f', 'e']
}

Here is an interative solution. Maybe not that efficient on large dataframes, but it does the job:

import pandas as pd

data = {
    'bb': ['a', 'z', 'y'],
    'vv': ['b', 'a', 'z'],
    'ww': ['c', 'b', 'a'],
    'xx': ['d', 'c', 'b'],
    'yy': ['e', 'd', 'c'],
    'zz': ['f', 'e', 'd'],
    'aa': ['g', 'f', 'e']

}
df = pd.DataFrame(data)

def check_row(row):
    for index, value in row.items(): #loop columns in row
        if 'a' in row[:index].to_list() and not row[index]=='a': #set value to None if 'a' is in a previous column
            row[index] = None
    return row

df[df.columns[~df.columns.isin(['zz', 'aa'])]] = df[df.columns[~df.columns.isin(['zz', 'aa'])]].apply(check_row, axis=1) #apply function to all columns except zz and aa

Result:

bb vv ww xx yy zz aa
0 a f g
1 z a e f
2 y z a d e

Please see my answer below:

import pandas as pd
import numpy as np

d = {
    'bb': ['a', 'z', 'y'],
    'vv': ['b', 'a', 'z'],
    'ww': ['c', 'b', 'a'],
    'xx': ['d', 'c', 'b'],
    'yy': ['e', 'd', 'c'],
    'zz': ['f', 'e', 'd'],
    'aa': ['g', 'f', 'e']

}

df = pd.DataFrame(d)

def edit_rows(row, afterString):
    try:
        a_pos = row.to_list().index(afterString)
        for index, val in enumerate(row):
            row[index] = np.nan if index > a_pos else val
        return row
    except ValueError: # In case 'a' is not present in the analysed row at all
        return row
    

afterString = 'a'
df.iloc[:, :df.columns.get_loc("zz")] = df.iloc[:, :df.columns.get_loc("zz")].apply(lambda row: edit_rows(row, afterString), axis=1)

Try this vectorized code.

afterString = 'a'
df = pd.DataFrame(df)
# flag for values after afterString
after_a = df.eq(afterString).shift(axis=1, fill_value=False).cummax(1)
# flag for values before column zz
before_zz = df.iloc[:,::-1].eq(df['zz'], axis=0).shift(axis=1, fill_value=False).cummax(1)
# mask the values between the two posts
output = df.mask(after_a & before_zz, '')

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM