简体   繁体   中英

find a value in a dataframe and add precedent column value in a new column in pandas

I have below data frame with 5 columns, I need to check specific string("-") in all columns and add precedent value in new column(F) if "-" is found. for example, "-" is located in Column B row zero and two; hence, 'a' and 'c'[precedent Column value] are added in Column(F) in related rows and so on.

Source Data Frame:

在此处输入图像描述

Desired Data Frame would be:

在此处输入图像描述

I have written below codes but get value length error when I want to create new Column(F), appreciate your support.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'},
 'B': {0: '-', 1: 'a', 2: '-', 3: 'b', 4: 'd'}})

df['C'] = np.where(df['B'].isin(df['A'].values), df['B'], np.nan)
df['C'] = df['C'].map(dict(zip(df.A.values, df.B.values)))
df['D'] = np.where(df['C'].isin(df['B'].values), df['C'], np.nan)
df['D'] = df['D'].map(dict(zip(df.B.values, df['C'].values)))
df['E'] = np.where(df['D'].isin(df['C'].values), df['D'], np.nan)
df['E'] = df['E'].map(dict(zip(df['C'].values, df['D'].values)))

a=np.array(df.iloc[:,:5])
g=[]
for index,x in np.ndenumerate(a):
    temp=[]
    if x=="-":
        temp.append(x-1)
    g.append(temp)
df['F']=g
print(df)

Replace misisng values to all columns by DataFrame.where exclude previous values by - compared by DataFrame.shift ed values, then back filling missing values and select first column by position:

df['F'] = df.where(df.shift(-1, axis=1).eq('-')).bfill(axis=1).iloc[:, 0]
print (df)
   A  B    F
0  a  -    a
1  b  a  NaN
2  c  -    c
3  d  b  NaN
4  e  d  NaN

You can do:

df['F']=[i[0][-1] if len(i)>1 else np.nan for i in df.fillna('').sum(axis=1).str.split('-') ]

output:

df['F']
Out[41]: 
0    a
1    a
2    c
3    a
4    a
Name: F, dtype: object

List Comprehension Explanation:

  • fill the NAs in df with '' and sum it across rows
  • split the sum with -
  • select the first element after spliting with - if length is > 1, else - wont be present hence fill with np.nan
  • select the last element of the splitted data by using [-1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM