一个单元格的字符串值在 pandas 数据帧的其他列中重复了多少次？

Question

I am trying to find out the number of times each cell value of column A appears in all the cells of the other column B using pandas.我正在尝试使用 pandas 找出 A 列的每个单元格值出现在另一列 B 的所有单元格中的次数。 for example for cell A1 value, we need to vlookup its value in all cells of column B and to find out in how many cells of column B it's repeated and then put the count value against it in the column C.例如对于单元格 A1 的值，我们需要在 B 列的所有单元格中查找它的值，并找出它在 B 列的多少个单元格中重复，然后将计数值放在 C 列中。 I checked all the possible solutions such as using contains, extract, groupby, etc but no result.我检查了所有可能的解决方案，例如使用包含、提取、分组等，但没有结果。 also, the value in the B column has no special text pattern to can define it in the code.此外，B 列中的值没有特殊的文本模式可以在代码中定义它。

This is what I've as a data frame:这就是我作为数据框的内容：

      A                            B                                C
 ============  ===============================================  ========
   T4561                                      T4561 (KHO ZAD)
   E2962                     E2962 (Bat - Rouchan),T5362(asw)
  DT2172                 T2172 (Masd),T2117 (Masd),T4561(fsd)
   T6096              T6096 (Mara),H1005 (BAHH), H1049 (QIEH)
   T5362                 T5362 (SYMI (ABAI)),E0993,E7523(pwd)
   E0993                       E0993 (Tean),T4561,E0993(ssdc)
   E1834                        E1834 (Ahaz),T5362,E0993(sdw)
   T2844  T2844 (Varmn),T3798 (QASIN), T3596 (Vara),T4561(qw)
   E7523                              E7523 (Sabk),E0993(bbz)
   T9062                        T9062 (Shrz),T5362,E7523(fgf)

And this is what I need:这就是我需要的：

      A                            B                                C
 ============  ===============================================  ========
       T4561                                      T4561 (KHO ZAD)  4
       E2962                     E2962 (Bat - Rouchan),T5362(asw)  1
      DT2172                 T2172 (Masd),T2117 (Masd),T4561(fsd)  0
       T6096              T6096 (Mara),H1005 (BAHH), H1049 (QIEH)  1
       T5362                 T5362 (SYMI (ABAI)),E0993,E7523(pwd)  4
       E0993                       E0993 (Tean),T4561,E0993(ssdc)  5
       E1834                        E1834 (Ahaz),T5362,E0993(sdw)  1
       T2844  T2844 (Varmn),T3798 (QASIN), T3596 (Vara),T4561(qw)  1
       E7523                              E7523 (Sabk),E0993(bbz)  3
       T9062                        T9062 (Shrz),T5362,E7523(fgf)  1

Answer 1

Use Series.str.extractall along with the regex pattern, then use Series.value_counts to compute the frequency, then use Series.map to map the values in column A to their corresponding frequencies:使用Series.str.extractall和正则表达式模式，然后使用Series.value_counts计算频率，然后使用Series.map到 map 列A中的值到它们对应的频率：

m = df['B'].str.extractall(f"({'|'.join(df['A'])})")[0].value_counts()
df['C'] = df['A'].map(m).fillna(0)

Result:结果：

        A                                                    B    C
0   T4561                                      T4561 (KHO ZAD)  4.0
1   E2962                     E2962 (Bat - Rouchan),T5362(asw)  1.0
2  DT2172                 T2172 (Masd),T2117 (Masd),T4561(fsd)  0.0
3   T6096              T6096 (Mara),H1005 (BAHH), H1049 (QIEH)  1.0
4   T5362                 T5362 (SYMI (ABAI)),E0993,E7523(pwd)  4.0
5   E0993                       E0993 (Tean),T4561,E0993(ssdc)  5.0
6   E1834                        E1834 (Ahaz),T5362,E0993(sdw)  1.0
7   T2844  T2844 (Varmn),T3798 (QASIN), T3596 (Vara),T4561(qw)  1.0
8   E7523                              E7523 (Sabk),E0993(bbz)  3.0
9   T9062                        T9062 (Shrz),T5362,E7523(fgf)  1.0

一个单元格的字符串值在 pandas 数据帧的其他列中重复了多少次？

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-08-18 16:39:52

一个单元格的字符串值在 pandas 数据帧的其他列中重复了多少次？

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-08-18 16:39:52

解决方案1
3 已采纳 2020-08-18 16:39:52