
[英]Get feature names of ColumnTransformer using StandarScaler and One-Hot-Encoding
[英]Using SklearnTutorial and unable to undertand the output of vectorizer.get_feature_names_out()
output 是我的 20newsgroups_train 数据的一部分吗? 还是来自默认库? 因为像“zz_g9q3”这样的词没有意义。 当前使用 20newsgroups_train 数据集和 20newsgroups_test 数据集
输入:
vectorizer=TfidfVectorizer()
vectors_test=vectorizer.transform(newsgroups_test.data)
print(vectorizer.get_feature_names_out()[-50:])
Output:
['zyra' 'zysec' 'zysgm3r' 'zysv' 'zyt' 'zyu' 'zyv' 'zyxel' 'zyxel1496b'
'zz' 'zz20d' 'zz93sigmc120' 'zz_g9q3' 'zzcrm' 'zzd' 'zzg6c' 'zzi776'
'zzneu' 'zznki' 'zznkj' 'zznkjz' 'zznkzz' 'zznp' 'zzo' 'zzr11' 'zzr1100'
'zzrk' 'zzt' 'zztop' 'zzy_3w' 'zzz' 'zzzoh' 'zzzz' 'zzzzzz' 'zzzzzzt'
'ªl' '³ation'
'º_________________________________________________º_____________________º'
'ºnd' 'çait' 'çon' 'ère' 'ée' 'égligent' 'élangea' 'érale' 'ête'
'íålittin' 'ñaustin' 'ýé']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.