简体   繁体   中英

Replace values with different condition in multiple columns in Pandas

I have a dataframe something like this but much larger:

source  next1     next2     next3
  b1     {-}       b2      -,b2,b3
  b2,b3      -   {b2,b3}  {b2,b3,b4}

Now I need to replace a lot of characters here. Every next column should contain values of previous. If the value is -, or {-} that means previous, and if it's not any of that, again, there need to be previous. Desired output:

source  next1  next2     next3
 b1      b1      b2     b1,b2,b3
 b2,b3   b2,b3   b2,bb3 b2,b3,b4

I have tried something like this:

for val in df['source'].values:
    if values=b1:
        df['next1'].replace('{-},', 'b1,',regex=True, inplace=True)
        df['next1'].replace('-,', 'b1,',regex=True, inplace=True)

etc But I have so much rows, and condiditons, so this works to long and not where precise, there are errors. Put one value (from replacing) to all rows.

I don't think there is a fast solution to your question, as string operations will always be slow-ish. Still, there is a better/faster one.

A straight-forward solution would be

for i in range(1, df.shape(1)):  # here only order matters
    df.iloc[:, i].str.replace('{-}', '-', inplace=True)
    mask = df.iloc[:, i].str.contains('-')
    df.iloc[mask, i].str.replace('{-}', df.iloc[mask, i-1], inplace=True)

with that, it is likely to be WAY faster to have all the columns as sets ({}) and operate on them as such.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM