[英]How do I Create a For Loop that checks if a column contains duplicates in a Pandas DataFrame
我正在嘗試創建一個for
循環,首先檢查列( 'col1'
)是否有重復項,如果為真,則將另一列( 'col2'
)中的值添加到( 'col1'
)。
但是,以下語句有效,所有 ( 'col1'
) 值都被視為重復值。 我敢肯定,該列中實際上很少有重復項,但不知何故,該語句一直返回 true。 我認為問題在於第二行包含.duplicated()
import pandas as pd
tuple = [['Jake','NY'],['Tom','Montana'],['Hannah','Cali'],['Jason','Boston'],['Tom','Washington'],['Hannah','Florida']]
df = pd.DataFrame(tuple, columns=('col1', 'col2'))
for i in df['col1']:
if df['col1'].duplicated().any():
df['col1'] = df['col1'] + ' - ' + df['col2']
您可以使用您在示例中顯示的重復值找到重復值,這可以用作使用loc
的過濾器:
df.loc[df['col1'].duplicated(keep=False), 'col1']
您可以使用它用新值替換這些值:
temp = df['col1'] + ' - ' + df['col2']
df.loc[df['col1'].duplicated(keep=False), 'col1'] = temp
以OP為例:
import pandas as pd
tuple = [['Jake','NY'],['Tom','Montana'],['Hannah','Cali'], ['Jason','Boston'],['Tom','Washington'],['Hannah','Florida']]
df = pd.DataFrame(tuple, columns=('col1', 'col2'))
temp = df['col1'] + ' - ' + df['col2']
df.loc[df['col1'].duplicated(keep=False), 'col1'] = temp
df
col1 col2
Jake NY
Tom - Montana Montana
Hannah - Cali Cali
Jason Boston
Tom - Washington Washington
Hannah - Florida Florida
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.