[英]How to check if value from one column is equal to value in another columns data-frame
I have two separate data frames df and xls.我有两个单独的数据框 df 和 xls。 Xls is a data frame that contain unique IDs that I would like to see how many times occur in my df data frame (~650,000 rows) and then create an occurrence column that would keep track of the amount of times that our unique IDs from our xls dataframe are appearing in the df dataframe. Xls 是一个包含唯一 ID 的数据框,我想查看在我的 df 数据框(约 650,000 行)中出现了多少次,然后创建一个出现列来跟踪我们的唯一 ID 从我们的xls dataframe 出现在 df dataframe 中。
xls = {'Unique ID': ['a', 'b', 'c', 'd', 'e'}
df = {'Contingency': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'e', 'd', 'b']}
result_df = {'Contingency': ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'd', 'b'],'Occurences': [4, 5, 0, 1, 0]
Ultimately, I would just like to keep a track of which Unique ID is appearing the most in DF given its unique ID.最终,我只想跟踪哪个唯一 ID 在 DF 中出现的次数最多,因为它的唯一 ID。
df.groupby('Contingency').count()
should produce the Series you are looking for, without the need for the xls dataframe containing the unique IDs. df.groupby('Contingency').count()
应该生成您正在寻找的系列,而不需要包含唯一 ID 的 xls dataframe。
Edit:编辑:
If your 'df' dataframe only has the 'Contingency' column, you'll need a second column to apply the count() to, like this:如果您的“df”dataframe 只有“应急”列,则需要第二列将 count() 应用于,如下所示:
df = pd.DataFrame({'Contingency': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'e', 'd', 'b']})
df['Occurances'] = 1
result = df.groupby('Contingency').count()
Otherwise you can just do:否则你可以这样做:
result = pd.DataFrame(df.Contingency.value_counts())
For the same result.对于相同的结果。
Then you can sort the values: result.sort_values(by = 'Contingency', ascending=False)
然后您可以对值进行排序: result.sort_values(by = 'Contingency', ascending=False)
results_df = df['Contingency'].value_counts().sort_index()
results_df = df['Contingency'].value_counts()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.