簡體   English   中英

獲取字符串在 pandas dataframe 的每一行每一列出現的次數

[英]Get a count of occurrence of string in each row and column of pandas dataframe

import pandas as pd
  
# list of paragraphs from judicial opinions
# rows are opinions
# columns are paragraphs from the opinion
opinion1 = ['sentenced to life','sentenced to death. The sentence ...','', 'sentencing Appellant for a term of life imprisonment']
opinion2 = ['Justice Smith','This concerns a sentencing hearing.', 'The third sentence read ...', 'Defendant rested.']
opinion3 = ['sentence sentencing sentenced','New matters ...', 'The clear weight of the evidence', 'A death sentence']
data = [opinion1, opinion2, opinion3]
df = pd.DataFrame(data, columns = ['p1','p2','p3','p4'])

# This works for one column. I have 300+ in the real data set.
df['p2'].str.contains('sentenc')

如何確定“sentenc”是否在“p1”到“p4”列中?

所需的 output 類似於:

True True False True
False True True False
True False False True

如何檢索每個單元格中“sentenc”出現的次數?

所需的 output 將是“sentenc”出現次數的每個單元格的計數:

1 2 0 1
0 1 1 0
3 0 0 1

謝謝!

使用pd.Series.str.count

counts = df.apply(lambda col: col.str.count('sentenc'))

Output:

>>> counts
   p1  p2  p3  p4
0   1   2   0   1
1   0   1   1   0
2   3   0   0   1

要以 boolean 形式獲取它,請使用.str.contains ,或使用上面的代碼調用.astype(bool)

bools = df.apply(lambda col: col.str.contains('sentenc'))

或者

bools = df.apply(lambda col: col.str.count('sentenc')).astype(bool)

兩者都可以正常工作。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM