字符串中短语之前的前序单词数

Question

Assuming I have a list of phrases: 假设我有一个短语列表：

list = ['new york', 'school', 'new']

and a string 和一个字符串

text = 'i am going to a school in new york and therefore i have to buy a new uniform to go to new york'

I would like to find the number of words preceeding each phrase (just for first appearance) ie output should be: 我想找到每个短语之前的单词数量（仅针对首次出现），即输出应为：

new york = 7
school = 5
new = 7

Any idea how I can effectively achieve this? 知道我怎样才能有效地做到这一点吗？

Answer 1

Naive approach, without any performance or NLP considerations: 幼稚的方法，不考虑任何性能或NLP：

lst = ['new york', 'school', 'new']  # do not use 'list' as a name
text = 'i am going to a school in new york and therefore i have to buy a new uniform to go to new york'

{p: len(text[:text.find(p)].strip().split()) for p in lst}
# {'new york': 7, 'school': 5, 'new': 7}

Answer 2

Using count and index : 使用count和index ：

lst = ['new york', 'school', 'new']
text = 'i am going to a school in new york and therefore i have to buy a new uniform to go to new york'

for x in lst:
    print(f"{x} = {text.count(' ', 0, text.index(x))}")

# new york = 7
# school = 5                                                   
# new = 7

count counts whitespaces in text from start till you meet the first appearance of phrase which is same as the number of words preceding that phrase. 从开始count ，直到遇到词组的首次出现为止， count计算text空格，该空格与该词组前面的单词数相同。

Answer 3

lst = ['new york', 'school', 'new']
text = 'i am going to a school in new york and therefore i have to buy a new uniform to go to new york'

This will give you the string whose count you are searching and count of string 这将为您提供要搜索其计数和字符串数的字符串

for x in lst:
    print(x +": "+str(len(text[0:text.index(x)].split(' ')) -1))

字符串中短语之前的前序单词数

问题描述

3 个解决方案

解决方案1
0 2018-09-05 09:19:30

解决方案2
0 2018-09-05 09:22:13

解决方案3
0 2018-09-05 09:50:00

字符串中短语之前的前序单词数

问题描述

3 个解决方案

解决方案1 0 2018-09-05 09:19:30

解决方案2 0 2018-09-05 09:22:13

解决方案3 0 2018-09-05 09:50:00

解决方案1
0 2018-09-05 09:19:30

解决方案2
0 2018-09-05 09:22:13

解决方案3
0 2018-09-05 09:50:00