简体   繁体   English

如何使用python在预处理的句子中取回单词的原始位置?

[英]How to get back original position of a word in a preprocessed sentence using python?

I am trying to get a sentence from User and preprocessing the same to remove special characters using regex at backend then I need to send back the position of a particular word in order to highlight that word to User, but facing conflict as the position of original and the preprocessed sentence are different.我正在尝试从用户那里获取一个句子并对其进行预处理以在后端使用正则表达式删除特殊字符然后我需要发回特定单词的位置以便向用户突出显示该单词,但面临冲突作为原始位置和预处理的句子不同。

Is there any best method to solve the above issue using Python?是否有使用 Python 解决上述问题的最佳方法?

For example:例如:

import re

def text_preprocessing(input_text, string_to_find):

    print("Original text is:", input_data)
    cleaned_text = [re.sub('[^a-zA-Z0-9#.+]', " ", input_data)]
    cleaned_text = [re.sub(' +', " ", text) for text in cleaned_text]
    for cleaned_text in cleaned_text:  # just to convert list to string
        print("preprocessed text is:", cleaned_text)
        position = cleaned_text.find(string_to_find)
        position = [position, position + len(string_to_find)]
        return position

input_text = 'Hi! Hello'
string_to_find = 'Hello'
position = text_preprocessing(input_text, string_to_find)
print(position)

Actual Output实际产量

Original text is: Hi! Hello
preprocessed text is: Hi Hello
[3, 8]

original sentence = 'Hi!原句 = '嗨! Hello'你好'

Preprocessed sentence = 'Hi Hello' (just removed '!' symbol)预处理语句 = 'Hi Hello' (刚刚去掉了 '!' 符号)

In case i need to highlight the word "Hello" I just returning the position from backend as (3,8) but the actual position in UI is (4, 9)如果我需要突出显示“你好”这个词,我只是将后端的位置返回为(3,8)但 UI 中的实际位置是(4, 9)

Expected Output预期产出

Original text is: Hi! Hello
preprocessed text is: Hi Hello
[4, 9]

OS: windows 10, Python 3.7, used regex for preprocessing操作系统:windows 10,Python 3.7,使用正则表达式进行预处理

The first character in a string is at position 0 , then, Hello is at position 3 in the string Hi Hello .字符串中的第一个字符位于位置0 ,然后Hello位于字符串Hi Hello中的位置3

  • H is at 0 H0
  • i is at 1 i1
  • is at 22
  • H is at 3 H3
  • e is at 4 e4
  • and so on ...等等 ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何通过知道 python 中单词的偏移量从文本文件中获取原始句子? - How to get the original sentence from a text file by knowing an offset of a word in python? 如何使用 python 计算句子中单词的长度 - How to count the length of a word in a sentence using python 如何制作一个Python程序来列出句子中某个单词的位置 - How to make a python program that lists the position/positions of a certain word in a sentence 如何使用循环而不是方法获取句子字符串中每个单词大写的第一个字母? 在 python - How to get the 1st letter of each word capital in a string of sentence using loops but not methods? in python "如何使用 python 在包含特定单词的文档中获取句子?" - how do i get the sentence in a document containing a particular word using python? 如何使用python找到单词在句子中的位置? - How can I find where a word is located in a sentence using python? 如何在不使用python替换方法的情况下替换句子中的单词 - How to replace a word in a sentence without using the python replace method 如何在python中使用正则表达式检测句子中的给定单词 - How to detect a given word in sentence using regex in python 如何在Python中找到带有start_index和end_index的句子中单词的position - how to find the position of a word in a sentence with start_index and end_index in Python 使用python反转句子中的每个单词 - reversing each word in sentence using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM