简体   繁体   English

一个单元格的字符串值在 pandas 数据帧的其他列中重复了多少次?

[英]How many time a string value of a cell is repeated in other column of pandas data frame?

I am trying to find out the number of times each cell value of column A appears in all the cells of the other column B using pandas.我正在尝试使用 pandas 找出 A 列的每个单元格值出现在另一列 B 的所有单元格中的次数。 for example for cell A1 value, we need to vlookup its value in all cells of column B and to find out in how many cells of column B it's repeated and then put the count value against it in the column C.例如对于单元格 A1 的值,我们需要在 B 列的所有单元格中查找它的值,并找出它在 B 列的多少个单元格中重复,然后将计数值放在 C 列中。 I checked all the possible solutions such as using contains, extract, groupby, etc but no result.我检查了所有可能的解决方案,例如使用包含、提取、分组等,但没有结果。 also, the value in the B column has no special text pattern to can define it in the code.此外,B 列中的值没有特殊的文本模式可以在代码中定义它。

This is what I've as a data frame:这就是我作为数据框的内容:

      A                            B                                C
 ============  ===============================================  ========
   T4561                                      T4561 (KHO ZAD)
   E2962                     E2962 (Bat - Rouchan),T5362(asw)
  DT2172                 T2172 (Masd),T2117 (Masd),T4561(fsd)
   T6096              T6096 (Mara),H1005 (BAHH), H1049 (QIEH)
   T5362                 T5362 (SYMI (ABAI)),E0993,E7523(pwd)
   E0993                       E0993 (Tean),T4561,E0993(ssdc)
   E1834                        E1834 (Ahaz),T5362,E0993(sdw)
   T2844  T2844 (Varmn),T3798 (QASIN), T3596 (Vara),T4561(qw)
   E7523                              E7523 (Sabk),E0993(bbz)
   T9062                        T9062 (Shrz),T5362,E7523(fgf)

And this is what I need:这就是我需要的:

      A                            B                                C
 ============  ===============================================  ========
       T4561                                      T4561 (KHO ZAD)  4
       E2962                     E2962 (Bat - Rouchan),T5362(asw)  1
      DT2172                 T2172 (Masd),T2117 (Masd),T4561(fsd)  0
       T6096              T6096 (Mara),H1005 (BAHH), H1049 (QIEH)  1
       T5362                 T5362 (SYMI (ABAI)),E0993,E7523(pwd)  4
       E0993                       E0993 (Tean),T4561,E0993(ssdc)  5
       E1834                        E1834 (Ahaz),T5362,E0993(sdw)  1
       T2844  T2844 (Varmn),T3798 (QASIN), T3596 (Vara),T4561(qw)  1
       E7523                              E7523 (Sabk),E0993(bbz)  3
       T9062                        T9062 (Shrz),T5362,E7523(fgf)  1

Use Series.str.extractall along with the regex pattern, then use Series.value_counts to compute the frequency, then use Series.map to map the values in column A to their corresponding frequencies:使用Series.str.extractall和正则表达式模式,然后使用Series.value_counts计算频率,然后使用Series.map到 map 列A中的值到它们对应的频率:

m = df['B'].str.extractall(f"({'|'.join(df['A'])})")[0].value_counts()
df['C'] = df['A'].map(m).fillna(0)

Result:结果:

        A                                                    B    C
0   T4561                                      T4561 (KHO ZAD)  4.0
1   E2962                     E2962 (Bat - Rouchan),T5362(asw)  1.0
2  DT2172                 T2172 (Masd),T2117 (Masd),T4561(fsd)  0.0
3   T6096              T6096 (Mara),H1005 (BAHH), H1049 (QIEH)  1.0
4   T5362                 T5362 (SYMI (ABAI)),E0993,E7523(pwd)  4.0
5   E0993                       E0993 (Tean),T4561,E0993(ssdc)  5.0
6   E1834                        E1834 (Ahaz),T5362,E0993(sdw)  1.0
7   T2844  T2844 (Varmn),T3798 (QASIN), T3596 (Vara),T4561(qw)  1.0
8   E7523                              E7523 (Sabk),E0993(bbz)  3.0
9   T9062                        T9062 (Shrz),T5362,E7523(fgf)  1.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将一个数据框的所有重复列值添加到 Pandas 中的另一个数据框 - Add all column values repeated of one data frame to other in pandas 如何访问熊猫数据框单元格中字符串值的索引? - How to access index of string value in a cell of pandas data frame? 当且仅当其他列满足特定条件时,Pandas 数据框会计算特定值在列中出现的次数 - Pandas Data frame counting how many times certain value appears in column if and only if other columns meet certain condition 通过熊猫数据框中的列中的重复值进行汇总 - Aggregate by repeated values in a column in a data frame in pandas 如何在另一个数据帧列pandas中检查一个数据帧的列值多少次? - how to check column value of one data frame how many times in another dataframe column pandas? 当列标题是日期时间值时,如何按列标题值对Pandas数据帧进行切片? - How to slice Pandas data frame by column header value when the column header is a date-time value? 如何按Pandas数据框中的列值进行分组 - How to Group by column value in Pandas Data frame 如何按列和值转置熊猫数据帧? - How to transpose pandas data frame by a column and value? 如何更新亚秒级时间序列熊猫数据框中的单元格值 - how to update a cell value in sub-second time-series pandas data frame 如何在pandas列中的每个重复的字符串值后面添加计数器编号? - How to append counter number to each repeated string value in pandas column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM