简体   繁体   English

Pandas:如何从另一个数据帧中获取出现次数?

[英]Pandas: How to get count of occurrence from another data frame?

I am using Python Pandas.我正在使用 Python Pandas。 I have 2 data-frames (namely: df1, df2).我有 2 个数据框(即:df1、df2)。 'df1' contains header-level data, like card-id, issued-on date etc. 'df2' has granular-level data, like each transaction performed by a specific card-id. 'df1' 包含标头级数据,如卡 ID、发行日期等。'df2' 具有粒度级数据,如由特定卡 ID 执行的每笔交易。 'Card-id' is common between the two dataframes. 'Card-id' 在两个数据帧之间是通用的。

df1:
 first_active_month          card_id  feature_1  feature_2  feature_3 
            2017-06  C_ID_92a2005557          5          2          1   
            2017-01  C_ID_3d0044924f          4          1          0   
            2016-08  C_ID_d639edf6cd          2          2          0   
            2017-09  C_ID_186d6a6901          4          3          0   
            2017-11  C_ID_cdbd2c0db2          1          3          0

df2:
   junk_id   authorized_flag          card_id  city_id Authorized 
    13292136               Y  C_ID_92a2005557      101          N   
    20069042               Y  C_ID_7a238b3713       69          N   
     5029656               Y  C_ID_92a2005557       17          N   
    16356907               N  C_ID_3d0044924f       -1          Y   
     8203441               Y  C_ID_fcf33361c2       17          N

I want to add a column "frequency" to df1 which will show me a count of occurrences of each card-id of df1 in df2.我想在 df1 中添加一个“频率”列,它将显示 df2 中 df1 的每个卡 ID 的出现次数。 So, df1 should look like below:所以,df1 应该如下所示:

df1 (after executing the command):
 first_active_month          card_id  feature_1  feature_2  feature_3    frequency
            2017-06  C_ID_92a2005557          5          2          1      2
            2017-01  C_ID_3d0044924f          4          1          0      5
            2016-08  C_ID_d639edf6cd          2          2          0      3
            2017-09  C_ID_186d6a6901          4          3          0      1
            2017-11  C_ID_cdbd2c0db2          1          3          0      7

Please note: I am new to Python / Pandas.请注意:我是 Python / Pandas 的新手。 I have already gone through multiple threads of this site, but all of them referred to counting in the same data-frame.我已经浏览了该站点的多个线程,但所有线程都提到在同一个数据帧中计数。 I am looking for a counting using join/merge functionality.我正在寻找使用加入/合并功能的计数。 Threads which I have already browsed: this , this , this , this , this , this , this .我已经浏览过的主题: 这个这个这个这个这个这个这个

I think you need Series.map with Series.value_counts and Series.fillna for replace missing values:我认为您需要Series.mapSeries.value_countsSeries.fillna来替换缺失值:

df1['frequency'] = df1['card_id'].map(df2['card_id'].value_counts()).fillna(0).astype(int)
print (df1)
  first_active_month          card_id  feature_1  feature_2  feature_3  \
0            2017-06  C_ID_92a2005557          5          2          1   
1            2017-01  C_ID_3d0044924f          4          1          0   
2            2016-08  C_ID_d639edf6cd          2          2          0   
3            2017-09  C_ID_186d6a6901          4          3          0   
4            2017-11  C_ID_cdbd2c0db2          1          3          0   

   frequency  
0          2  
1          1  
2          0  
3          0  
4          0  

Actually, there is a part of answer in your question.实际上,您的问题中有一部分答案。 You should count frequency first:你应该先计算频率:

df3 = df2.groupby(["card_id"], as_index=False)[["junk_id"]].count().rename(columns={"junk_id":"frequency"})

The rename part is needed as pandas leaves column names after groupby operation unchanged.需要重命名部分,因为 pandas 在 groupby 操作后保留列名不变。 Next you can merge your dfs:接下来,您可以合并您的 dfs:

df1 = df1.merge(df3, how='left', on='card_id')

And you can surely do that in one line by substituting df3 into the merge statement.您当然可以通过将df3替换为 merge 语句在一行中做到这一点。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫数据帧如何在时间序列数据中从一个时间帧获取数据到另一个1分钟时间帧 - Pandas dataframe how to get data from one time frame to another 1 min time frame in Time series data 如何从仅存在一次的熊猫数据框中过滤出值 - How to filter out values from a pandas data frame for which only one occurrence exists 从 pandas 中的代码列的另一个数据框中获取描述 - get description from another data frame with code column in pandas 如何将 Pandas 数据帧从一个文件读取到另一个文件 - How to read Pandas data frame from one file to another file pandas:从一个数据帧添加行到另一个数据帧? - pandas: Add row from one data frame to another data frame? Pandas:对于在特定日期内另一个 df 中出现的每个行计数 - Pandas: for each row count occurrence in another df within specific dates 如何在没有 for 循环的情况下从另一行的值中减去 pandas 数据帧的一行中的值? - How to subtract value in one row of pandas Data Frame from value in another row without for loop? 熊猫:从另一个数据框中复制列时出错 - pandas: error when copy a column from another data frame Pandas 将一个数据框中的列复制到另一个名称不同的数据框中 - Pandas Copy columns from one data frame to another with different name Pandas_data frame / Python:如何根据其最高重复值计数对数据框列进行排序? - Pandas_data frame/Python : How to sort a data frame column based on its highest repeated value count?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM