简体   繁体   English

熊猫计算列中每个值的出现次数

[英]Pandas count the occurrences of each value in column

I have this dataframe:我有这个数据框:

我的数据框

I want to have a new column that counts only the first instances of the matchID in the column MatchID.我想要一个新列,它只计算 MatchID 列中 matchID 的第一个实例。

Specifically, it checks the matchID to see if it is unique.具体来说,它会检查 matchID 以查看它是否唯一。 If it is unique it puts inputs a 1 in the new column row.如果它是唯一的,它会在新的列行中输入 1。 If it is not unique but the FIRST instance of the matchID, it also inputs a 1 in the new column.如果它不是唯一的而是 matchID 的第一个实例,它还会在新列中输入 1。 If it is a duplicate and not the first instance it puts a zero in the new column.如果它是重复的而不是第一个实例,它会在新列中放置一个零。

Any help would be amazing.任何帮助将是惊人的。 Switching from excel to pandas is much much harder then expected :).从 excel 切换到 Pandas 比预期的要困难得多:)。

Thanks in advance.提前致谢。

怎么样:

df['Count'] = (~df['MatchID'].duplicated()).astype(int)

Here's an approach based on a sample DataFrame:这是一种基于示例 DataFrame 的方法:

# Some dummy data. The field ID is equivalent to MatchID
df = pd.DataFrame([("A",12),("B", 12), ("A",123)], columns=["id","val"])
# Create a temporary subset of the DF that matches the "first or unique" rule
first_or_unique = df.drop_duplicates(subset="id", keep="first")
# Populate the new match lookup series with 0 for all rows to begin with
df["match"] = 0
# Finally, use `.loc` along with the temporary DF's index to set the relevant
# rows to be 1
df.loc[first_or_unique.index.values, "match"] = 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM