獲取字符串在 pandas dataframe 的每一行每一列出現的次數

Question

import pandas as pd
  
# list of paragraphs from judicial opinions
# rows are opinions
# columns are paragraphs from the opinion
opinion1 = ['sentenced to life','sentenced to death. The sentence ...','', 'sentencing Appellant for a term of life imprisonment']
opinion2 = ['Justice Smith','This concerns a sentencing hearing.', 'The third sentence read ...', 'Defendant rested.']
opinion3 = ['sentence sentencing sentenced','New matters ...', 'The clear weight of the evidence', 'A death sentence']
data = [opinion1, opinion2, opinion3]
df = pd.DataFrame(data, columns = ['p1','p2','p3','p4'])

# This works for one column. I have 300+ in the real data set.
df['p2'].str.contains('sentenc')

如何確定“sentenc”是否在“p1”到“p4”列中？

所需的 output 類似於：

True True False True
False True True False
True False False True

如何檢索每個單元格中“sentenc”出現的次數？

所需的 output 將是“sentenc”出現次數的每個單元格的計數：

1 2 0 1
0 1 1 0
3 0 0 1

謝謝！

Answer 1

使用pd.Series.str.count ：

counts = df.apply(lambda col: col.str.count('sentenc'))

Output：

>>> counts
   p1  p2  p3  p4
0   1   2   0   1
1   0   1   1   0
2   3   0   0   1

要以 boolean 形式獲取它，請使用.str.contains ，或使用上面的代碼調用.astype(bool) ：

bools = df.apply(lambda col: col.str.contains('sentenc'))

或者

bools = df.apply(lambda col: col.str.count('sentenc')).astype(bool)

兩者都可以正常工作。

獲取字符串在 pandas dataframe 的每一行每一列出現的次數

問題描述

如何確定“sentenc”是否在“p1”到“p4”列中？

如何檢索每個單元格中“sentenc”出現的次數？

1 個解決方案

解決方案1
3 已采納

獲取字符串在 pandas dataframe 的每一行每一列出現的次數

問題描述

如何確定“sentenc”是否在“p1”到“p4”列中？

如何檢索每個單元格中“sentenc”出現的次數？

1 個解決方案

解決方案1 3 已采納

解決方案1
3 已采納