![](/img/trans.png)
[英]how to replace woe values of all columns in data frame with WOE values in pandas data frame
[英]how to get unique values in all columns in pandas data frame
我想列出Pandas數據幀中所有列中的所有唯一值,並將它們存儲在另一個數據幀中。 我已經嘗試了這個,但它的附加行明智,我希望列明智。 我怎么做?
raw_data = {'student_name': ['Miller', 'Miller', 'Ali', 'Miller'],
'test_score': [76, 75,74,76]}
df2 = pd.DataFrame(raw_data, columns = ['student_name', 'test_score'])
newDF = pd.DataFrame()
for column in df2.columns[0:]:
dat = df2[column].drop_duplicates()
df3 = pd.DataFrame(dat)
newDF = newDF.append(df3)
print(newDF)
Expected Output:
student_name test_score
Ali 74
Miller 75
76
我想你可以使用drop_duplicates
。
如果要檢查某些列並保留第一行,如果是dupe:
newDF = df2.drop_duplicates('student_name')
print(newDF)
student_name test_score
0 Miller 76.0
1 Jacobson 88.0
2 Ali 84.0
3 Milner 67.0
4 Cooze 53.0
5 Jacon 96.0
6 Ryaner 64.0
7 Sone 91.0
8 Sloan 77.0
9 Piger 73.0
10 Riani 52.0
謝謝你,@cᴏʟᴅsᴘᴇᴇᴅ尋求另一種解決方案:
df2[~df2.student_name.duplicated()]
但是如果想要將所有列一起檢查為dupes,請保留第一行:
newDF = df2.drop_duplicates()
print(newDF)
student_name test_score
0 Miller 76.0
1 Jacobson 88.0
2 Ali 84.0
3 Milner 67.0
4 Cooze 53.0
5 Jacon 96.0
6 Ryaner 64.0
7 Sone 91.0
8 Sloan 77.0
9 Piger 73.0
10 Riani 52.0
11 Ali NaN
按新樣本編輯 - 刪除重復項並按兩列排序:
newDF = df2.drop_duplicates().sort_values(['student_name', 'test_score'])
print(newDF)
student_name test_score
2 Ali 74
1 Miller 75
0 Miller 76
編輯1:如果想要由NaN
s的第一列替換dupes:
newDF = df2.drop_duplicates().sort_values(['student_name', 'test_score'])
newDF['student_name'] = newDF['student_name'].mask(newDF['student_name'].duplicated())
print(newDF)
student_name test_score
2 Ali 74
1 Miller 75
0 NaN 76
EDIT2:更通用的解決方案:
newDF = df2.sort_values(df2.columns.tolist())
.reset_index(drop=True)
.apply(lambda x: x.drop_duplicates())
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.