獲取字符串中數字的索引並提取數字前后的單詞（不同語言）

Question

我嘗試使用正則表達式並找到數字但沒有找到整個數字的索引，而是只為數字中的第一個字符獲取索引

text = "४०० pounds of wheat at $ 3 per pound"
numero = re.finditer(r"(\d+)", text) ####
op = re.findall(r"(\d+)", text) ####

indices = [m.start() for m in numero]
OUTPUT

[0, 25]

***Expected OUTPUT***
[0, 6]

在找到確切的索引並存儲在列表中之后，提取單詞會更容易。 這是我相信的嗎？ 你怎么看？

另外，我期待不同位置的單詞，所以它不能是 static 方法

Answer 1

你用nlp標簽標記了這個問題，它是一個python問題，你為什么不使用Spacy ？

查看帶有 Spacy 3.0.1 的 Python 演示：

import spacy
nlp = spacy.load("en_core_web_trf")
text = "४०० pounds of wheat at $ 3 per pound"
doc = nlp(text)
print([(token.text, token.i) for token in doc if token.is_alpha])
## => [('pounds', 1), ('of', 2), ('wheat', 3), ('at', 4), ('per', 7), ('pound', 8)]
## => print([(token.text, token.i) for token in doc if token.like_num])
[('४००', 0), ('3', 6)]

這里，

nlp object 初始化為英文“大” model
doc是使用您的text變量初始化的 Spacy 文檔
[(token.text, token.i) for token in doc if token.is_alpha]您提供包含其值 ( token.text ) 及其在文檔中的位置 ( token.i ) 的字母單詞列表
[(token.text, token.i) for token in doc if token.like_num]獲取數字列表及其在文檔中的位置。

Answer 2

您可以對其進行標記並以這種方式構建您的邏輯。 嘗試這個：


number_index = []
text = "४०० pounds of wheat at $ 3 per pound"
text_list = text.split(" ")

# Find which words are integers.
for index, word in enumerate(text_list):
    try:
        int(word)
        number_index.append(index)
    except:
        pass

# Now perform operations on those integers
for i in number_index:
    word = text_list[i]
    # do operations and put it back in the list

# Re-build string afterwards

獲取字符串中數字的索引並提取數字前后的單詞（不同語言）

問題描述

2 個解決方案

解決方案1
1 已采納 2021-03-14 22:33:56

解決方案2
0 2021-03-14 03:28:58

獲取字符串中數字的索引並提取數字前后的單詞（不同語言）

問題描述

2 個解決方案

解決方案1 1 已采納 2021-03-14 22:33:56

解決方案2 0 2021-03-14 03:28:58

解決方案1
1 已采納 2021-03-14 22:33:56

解決方案2
0 2021-03-14 03:28:58