[英]Make new dataframe from existing dataframe with unique values from one column and corresponding values from other columns
I have a dataframe 'raw' that looks like this -我有一个看起来像这样的数据框“原始” -
It has many rows with duplicate values in each column.它在每列中有许多行具有重复值。 I want to make a new dataframe 'new_df' which has unique customer_code corresponding and market_code .
我想制作一个具有唯一customer_code对应和market_code的新数据框 'new_df' 。 The new_df should look like this -
new_df 应该是这样的 -
It sounds like you simply want to create a DataFrame with unique customer_code
which also shows market_code
.听起来您只是想创建一个具有唯一
customer_code
的 DataFrame,它也显示market_code
。 Here's a way to do it:这是一种方法:
df = df[['customer_code','market_code']].drop_duplicates('customer_code')
Output:输出:
customer_code market_code
0 Cus001 Mark001
1 Cus003 Mark003
3 Cus004 Mark003
4 Cus005 Mark004
The part reading df[['customer_code','market_code']]
gives us a DataFrame containing only the two columns of interest, and the drop_duplicates('customer_code')
part eliminates all but the first occurrence of duplicate values in the customer_code
column (though you could instead keep the last occurrence of each duplicate by calling it using the keep='last'
argument).读取
df[['customer_code','market_code']]
的部分为我们提供了一个仅包含两列感兴趣的 DataFrame,而drop_duplicates('customer_code')
部分消除了customer_code
列中除第一次出现的重复值之外的所有重复值(尽管您可以通过使用keep='last'
参数调用它来保留每个重复项的最后一次出现)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.