計算電影腳本中角色說的單詞

Question

我已經設法在一些幫助下發現了口語。 現在，我正在尋找讓選定的人說的文字。 因此，我可以輸入MIA並獲得她在電影中所說的每個單詞，就像這樣：

name = input("Enter name:")
wordsspoken(script, name)
name1 = input("Enter another name:")
wordsspoken(script, name1)

這樣我以后就可以數數了。

這是電影腳本的樣子

An awkward beat. They pass a wooden SALOON -- where a WESTERN
 is being shot. Extras in COWBOY costumes drink coffee on the
 steps.
                     Revision                        25.


                   MIA (CONT'D)
      I love this stuff. Makes coming to work
      easier.

                   SEBASTIAN
      I know what you mean. I get breakfast
      five miles out of the way just to sit
      outside a jazz club.

                   MIA
      Oh yeah?

                   SEBASTIAN
      It was called Van Beek. The swing bands
      played there. Count Basie. Chick Webb.
             (then,)
      It's a samba-tapas place now.

                   MIA
      A what?

                   SEBASTIAN
      Samba-tapas. It's... Exactly. The joke's on
      history.

Answer 1

我會先問用戶腳本中的所有名稱。 然后問他們想要哪個名字的單詞。 我會逐字搜索文本，直到找到想要的名稱，然后將以下單詞復制到變量中，直到找到與腳本中其他人匹配的名稱。 現在人們可以說出另一個字符的名稱，但是如果您假設講話的人的標題全部是大寫字母，或者是一行，那么文字應該很容易過濾。

for word in script:
    if word == speaker and word.isupper(): # you may want to check that this is on its own line as well.
        recording = True
    elif word in character_names and word.isupper():  # you may want to check that this is on its own line as well.
        recording = False

    if recording:
        spoken_text += word + " "

Answer 2

我將概述如何生成一個dict，該dict可以為您提供所有發言者說出的單詞數，並且可以近似一個現有的實現方式。

一般使用

如果我們將一個單詞定義為沿''（空格）分割的字符串中的任何字符塊...

import re

speaker = '' # current speaker
words = 0 # number of words on line
word_count = {} # dict of speakers and the number of words they speak

for line in script.split('\n'):
    if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
            speaker = line.split(' (')[0][19:]
    if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
            words = len(line.split())
            if speaker in word_count:
                 word_count[speaker] += words
            else:
                 word_count[speaker] = words

如果John Doe說55個字，則生成格式為{'JOHN DOE':55}的字典。

輸出示例：

>>> word_count['MIA']

13

您的實施

這是上述過程的一個版本，它近似於您的實現。

import re

def wordsspoken(script,name):
    word_count = 0
    for line in script.split('\n'):
        if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
            speaker = line.split(' (')[0][19:]
        if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
            if speaker == name:
                word_count += len(line.split())
    print(word_count)

def main():
    name = input("Enter name:")
    wordsspoken(script, name)
    name1 = input("Enter another name:")
    wordsspoken(script, name1)

Answer 3

如果您只想對腳本進行一次傳遞（我想這可能會很長）來計算您的計數，則只需跟蹤哪個角色在說話； 像一個小狀態機一樣設置：

import re
from collections import Counter, defaultdict

words_spoken = defaultdict(Counter)
currently_speaking = 'Narrator'

for line in SCRIPT.split('\n'):
    name = line.replace('(CONT\'D)', '').strip()
    if re.match('^[A-Z]+$', name):
        currently_speaking = name
    else:
        words_spoken[currently_speaking].update(line.split())

您可以使用更復雜的正則表達式來檢測揚聲器何時更改，但這可以解決問題。

演示

Answer 4

上面有一些好主意。 以下內容在Python 2.x和3.x中應該可以正常工作：

import codecs
from collections import defaultdict

speaker_words = defaultdict(str)

with codecs.open('script.txt', 'r', 'utf8') as f:
  speaker = ''
  for line in f.read().split('\n'):
    # skip empty lines
    if not line.split():
      continue

    # speakers have their names in all uppercase
    first_word = line.split()[0]
    if (len(first_word) > 1) and all([char.isupper() for char in first_word]):
      # remove the (CONT'D) from a speaker string
      speaker = line.split('(')[0].strip()

    # check if this is a dialogue line
    elif len(line) - len(line.lstrip()) == 6:
      speaker_words[speaker] += line.strip() + ' '

# get a Python-version-agnostic input
try:
  prompt = raw_input
except:
  prompt = input

speaker = prompt('Enter name: ').strip().upper()
print(speaker_words[speaker])

示例輸出：

Enter name: sebastian
I know what you mean. I get breakfast five miles out of the way just to sit outside a jazz club. It was called Van Beek. The swing bands played there. Count Basie. Chick Webb. It's a samba-tapas place now. Samba-tapas. It's... Exactly. The joke's on history.

計算電影腳本中角色說的單詞

問題描述

4 個解決方案

解決方案1
2 2018-04-17 15:05:45

解決方案2
2 2018-04-17 17:39:34

一般使用

您的實施

解決方案3
1 已采納 2018-04-17 15:10:53

解決方案4
1 2018-04-18 13:33:34

計算電影腳本中角色說的單詞

問題描述

4 個解決方案

解決方案1 2 2018-04-17 15:05:45

解決方案2 2 2018-04-17 17:39:34

一般使用

您的實施

解決方案3 1 已采納 2018-04-17 15:10:53

解決方案4 1 2018-04-18 13:33:34

解決方案1
2 2018-04-17 15:05:45

解決方案2
2 2018-04-17 17:39:34

解決方案3
1 已采納 2018-04-17 15:10:53

解決方案4
1 2018-04-18 13:33:34