df
pycld3 spacy seqtolang langid langdetect text_language
0 lt unknown ro en pl unknown
1 bg unknown fi en tl unknown
3 no id in de no no
4 en en zh en en en
5 en en en en en en
I'd like to create a new column that checks the value of a column compared to the base column: text_language
. If the value in df['pycld3']
== df['text_language']
, a new column df['pycld3_true']
= 1. If not, the value is 0. I want to do the same for other columns.
Expected Output
df
pycld3 spacy seqtolang langid langdetect text_language pycld3_true spacy_true ....
0 lt unknown ro en pl un 0 1
1 bg unknown fi en tl un 0 1
3 no id in de no no 1 0
4 en en zh en en en 1 1
5 en en en en en en 1 1
The code that I can think of right now is:
for row in df['pycld3']:
if df['pycld3'][i] == df['text_language'][i]:
df['pycld3_true'] == 1
elif:
df['pycld3'][i] != df['text_language'][i]:
df['pycld3_true'] == 0
else:
df['pycld3_true']== 'nan'
The code above is incorrect and inefficient.
df
Out[6]:
one two three
0 10 1.2 4.2
1 15 70 0.03
2 8 5 0
df['new'] = df['one']==df['two']
df
Out[8]:
one two three new
0 10 1.2 4.2 False
1 15 70 0.03 False
2 8 5 0 False
df['new'] = df['new'].astype(int)
df
Out[10]:
one two three new
0 10 1.2 4.2 0
1 15 70 0.03 0
2 8 5 0 0
Try this
for col in df.columns[:5]:
df[f'{col}_true'] = (df[col] == df['text_language']).astype(int)
print(df)
Output:
pycld3 spacy seqtolang langid langdetect text_language pycld3_true spacy_true seqtolang_true langid_true langdetect_true
0 lt unknown ro en pl unknown 0 1 0 0 0
1 bg unknown fi en tl unknown 0 1 0 0 0
3 no id in de no no 1 0 0 0 1
4 en en zh en en en 1 1 0 1 1
5 en en en en en en 1 1 1 1 1
Use f string, list comprehension to look up the values at once
for c in df.iloc[:,:-1]:
df[f'{c}_true'] = df.apply(lambda x: x.text_language in x[c], axis=1).astype(int)
print(df)
pycld3 spacy seqtolang langid langdetect text_language pycld3_true \
0 lt unknown ro en pl un 0
1 bg unknown fi en tl un 0
3 no id in de no no 1
4 en en zh en en en 1
5 en en en en en en 1
spacy_true seqtolang_true langid_true langdetect_true
0 1 0 0 0
1 1 0 0 0
3 0 0 0 1
4 1 0 1 1
5 1 1 1 1
Alternatively can try column by column as follows if you just needed the two shown in sample
df['pycld3_true']=df['pycld3'].isin(df['text_language']).astype(int)
df['spacy_true']=df.apply(lambda x: x.text_language in x.spacy, axis=1).astype(int)
print(df)
pycld3 spacy seqtolang langid langdetect text_language pycld3_true \
0 lt unknown ro en pl un 0
1 bg unknown fi en tl un 0
3 no id in de no no 1
4 en en zh en en en 1
5 en en en en en en 1
spacy_true
0 1
1 1
3 0
4 1
5 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.