简体   繁体   中英

Create new column that checks equality with another column in Python

df
   pycld3   spacy   seqtolang   langid  langdetect  text_language
0   lt     unknown      ro        en         pl         unknown
1   bg     unknown      fi        en         tl         unknown
3   no        id        in        de         no         no
4   en        en        zh        en         en         en
5   en        en        en        en         en         en

I'd like to create a new column that checks the value of a column compared to the base column: text_language . If the value in df['pycld3'] == df['text_language'] , a new column df['pycld3_true'] = 1. If not, the value is 0. I want to do the same for other columns.

Expected Output

df
   pycld3   spacy   seqtolang   langid  langdetect  text_language  pycld3_true    spacy_true  ....
0   lt     unknown      ro        en         pl         un             0               1
1   bg     unknown      fi        en         tl         un             0               1
3   no        id        in        de         no         no             1               0
4   en        en        zh        en         en         en             1               1
5   en        en        en        en         en         en             1               1

The code that I can think of right now is:

for row in df['pycld3']:
   if df['pycld3'][i] == df['text_language'][i]:
      df['pycld3_true'] == 1
   elif: 
      df['pycld3'][i] != df['text_language'][i]:
      df['pycld3_true'] == 0
   else:
      df['pycld3_true']== 'nan'

The code above is incorrect and inefficient.

df
Out[6]: 
  one  two three
0  10  1.2   4.2
1  15   70  0.03
2   8    5     0
df['new'] = df['one']==df['two']
df
Out[8]: 
  one  two three    new
0  10  1.2   4.2  False
1  15   70  0.03  False
2   8    5     0  False

df['new'] = df['new'].astype(int)
df
Out[10]: 
  one  two three  new
0  10  1.2   4.2    0
1  15   70  0.03    0
2   8    5     0    0

Try this

for col in df.columns[:5]:
    df[f'{col}_true'] = (df[col] == df['text_language']).astype(int)
print(df)

Output:

  pycld3    spacy seqtolang langid langdetect text_language  pycld3_true  spacy_true  seqtolang_true  langid_true  langdetect_true
0     lt  unknown        ro     en         pl       unknown            0           1               0            0                0
1     bg  unknown        fi     en         tl       unknown            0           1               0            0                0
3     no       id        in     de         no            no            1           0               0            0                1
4     en       en        zh     en         en            en            1           1               0            1                1
5     en       en        en     en         en            en            1           1               1            1                1

Use f string, list comprehension to look up the values at once

for c in df.iloc[:,:-1]:
     df[f'{c}_true'] = df.apply(lambda x: x.text_language in x[c], axis=1).astype(int)
print(df)


  pycld3    spacy seqtolang langid langdetect text_language  pycld3_true  \
0     lt  unknown        ro     en         pl            un            0   
1     bg  unknown        fi     en         tl            un            0   
3     no       id        in     de         no            no            1   
4     en       en        zh     en         en            en            1   
5     en       en        en     en         en            en            1   

   spacy_true  seqtolang_true  langid_true  langdetect_true  
0           1               0            0                0  
1           1               0            0                0  
3           0               0            0                1  
4           1               0            1                1  
5           1               1            1                1 

Alternatively can try column by column as follows if you just needed the two shown in sample

df['pycld3_true']=df['pycld3'].isin(df['text_language']).astype(int)
df['spacy_true']=df.apply(lambda x: x.text_language in x.spacy, axis=1).astype(int)
print(df)


  pycld3    spacy seqtolang langid langdetect text_language  pycld3_true  \
0     lt  unknown        ro     en         pl            un            0   
1     bg  unknown        fi     en         tl            un            0   
3     no       id        in     de         no            no            1   
4     en       en        zh     en         en            en            1   
5     en       en        en     en         en            en            1   

   spacy_true  
0           1  
1           1  
3           0  
4           1  
5           1  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM