How can I edit a dataframe column containing sequence based on the condition?

Question

I have a dataframe that contains a sequence containing coded values and the day (eg (A,1)) on which it was recorded. My goal is to check for coded values X and Y. And if they occur on the same day, remove the Y value from the sequence.

ID     Sequence
1      [(A,1), (B,1), (X,2), (Y,2), (Y,3)]
2      [(C,1), (X,2), (Y,2), (Z,2)]
3      [(C,1), (D,2), (X,3), (Y,3),(Z,3)]

The results I'm expecting are:

ID     Sequence
1      [(A,1), (B,1), (X,2), (Y,3)] 
2      [(C,1), (X,2), (Z,2)]
3      [(C,1), (D,2), (X,3), (Z,3)]

Is there any way I can write a function to get these results? Any help would be appreciated.

Answer 1

You can check a set membership ( which is quite fast for such usecases ), on the 1th index (2nd item) in the tuple if the first value is in X or Y, if the second item already exists, it wouldn't append the list, then use this function with df.apply

def fun(l):
    s = set()
    lst = []
    for i in l:
        if i[0] in ('X','Y'):
            if i[1] not in s:
                s.add(i[1])
                lst.append(i)
        else:
            lst.append(i)
    return lst

df['Sequence'].apply(fun) # df['Sequence']=df['Sequence'].apply(fun) assign back

0    [(A, 1), (B, 1), (X, 2), (Y, 3)]
1            [(C, 1), (X, 2), (Z, 2)]
2    [(C, 1), (D, 2), (X, 3), (Z, 3)]
Name: Sequence, dtype: object

Answer 2

You can make use of itertools.groupby() to group same day into same group then filter out the Y in same group.

At last use itertools.chain() to flatten list of list.

import itertools

def remove_y(lst):
    res = []

    for key, values in itertools.groupby(lst, key=lambda x: x[1]):
        values = list(values)

        if len(values) > 1:
            res.append([value for value in values if not 'Y' in value])
        else:
            res.append(values)

    return list(itertools.chain(*res))


df['B'] = df['B'].apply(remove_y)

# print(df)

   ID                                 B
0   1  [(A, 1), (B, 1), (X, 2), (Y, 3)]
1   2          [(C, 1), (X, 2), (Z, 2)]
2   3  [(C, 1), (D, 2), (X, 3), (Z, 3)]

How can I edit a dataframe column containing sequence based on the condition?

Question

2 answers

solution1
1 ACCPTED 2021-04-16 04:58:11

solution2
1 2021-04-16 05:08:45

How can I edit a dataframe column containing sequence based on the condition?

Question

2 answers

solution1 1 ACCPTED 2021-04-16 04:58:11

solution2 1 2021-04-16 05:08:45

solution1
1 ACCPTED 2021-04-16 04:58:11

solution2
1 2021-04-16 05:08:45