简体   繁体   English

如何检查一列中的值是否等于另一列数据框中的值

[英]How to check if value from one column is equal to value in another columns data-frame

I have two separate data frames df and xls.我有两个单独的数据框 df 和 xls。 Xls is a data frame that contain unique IDs that I would like to see how many times occur in my df data frame (~650,000 rows) and then create an occurrence column that would keep track of the amount of times that our unique IDs from our xls dataframe are appearing in the df dataframe. Xls 是一个包含唯一 ID 的数据框,我想查看在我的 df 数据框(约 650,000 行)中出现了多少次,然后创建一个出现列来跟踪我们的唯一 ID 从我们的xls dataframe 出现在 df dataframe 中。

xls = {'Unique ID': ['a', 'b', 'c', 'd', 'e'}
df = {'Contingency': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'e', 'd', 'b']} 
result_df = {'Contingency': ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'd', 'b'],'Occurences': [4, 5, 0, 1, 0]  

Ultimately, I would just like to keep a track of which Unique ID is appearing the most in DF given its unique ID.最终,我只想跟踪哪个唯一 ID 在 DF 中出现的次数最多,因为它的唯一 ID。

df.groupby('Contingency').count() should produce the Series you are looking for, without the need for the xls dataframe containing the unique IDs. df.groupby('Contingency').count()应该生成您正在寻找的系列,而不需要包含唯一 ID 的 xls dataframe。

Edit:编辑:

If your 'df' dataframe only has the 'Contingency' column, you'll need a second column to apply the count() to, like this:如果您的“df”dataframe 只有“应急”列,则需要第二列将 count() 应用于,如下所示:

df = pd.DataFrame({'Contingency': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'e', 'd', 'b']})
df['Occurances'] = 1
result = df.groupby('Contingency').count()

Otherwise you can just do:否则你可以这样做:

result = pd.DataFrame(df.Contingency.value_counts())

For the same result.对于相同的结果。

Then you can sort the values: result.sort_values(by = 'Contingency', ascending=False)然后您可以对值进行排序: result.sort_values(by = 'Contingency', ascending=False)

if you want to sort by unique IDs如果您想按唯一 ID 排序

  results_df = df['Contingency'].value_counts().sort_index()

if you want to sort by the frequency of occurrence.如果要按发生频率排序。

 results_df =  df['Contingency'].value_counts()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据数据框中另一列的值创建新列 - Create new column based on a value of another column in a data-frame 比较数据框值错误中的 2 列 - Comparing 2 columns in data-frame Value Error 如何将 pyspark 数据帧中的时间戳列值减少 1 毫秒 - how to reduce timestamp column value in pyspark data-frame by 1 ms 如何检查 pandas 数据帧中每个唯一值的频率? - How to check frequency of every unique value from pandas data-frame? 如何根据列名将一个数据框中的列值复制到另一个数据框中? - How do I copy the value of columns in one data frame to another data frame based on column names? 比较并替换数据帧中的值 - compare and replace the value from a data-frame Pandas:根据其中一列的值将多个新列连接到现有数据帧 - Pandas: concat multiple new columns to an existing data-frame based on the value of one of the columns 如何从匹配 2 列的另一个数据框中更新数据框的列值? - How to update column value of a data frame from another data frame matching 2 columns? 如何在另一个数据帧列pandas中检查一个数据帧的列值多少次? - how to check column value of one data frame how many times in another dataframe column pandas? 检查一个数据帧的任何值(多列)是否在另一数据帧的任何值(多列)中 - Check if any value ( multiple columns) of one data frame exists in any values (multiple columns) of another data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM