[英]How to convert strings in python to vectors of numbers so i can compare them to other vectors?
I am looking for a way to convert Strings into vectors of numbers in phyton.我正在寻找一种方法将字符串转换为 phyton 中的数字向量。 Like喜欢
"Hi how are you?" “你好你好吗?” -> "29 73 281 38" -> "29 73 281 38"
"How are you doing" -> "73 281 28 54" “你好吗”->“73 281 28 54”
I want to compare sentences from a user input to sentences out of a databese, which are stored as vectors.我想将来自用户输入的句子与存储为向量的数据库中的句子进行比较。
I am assuming you are trying to create a dense vector representation for your input sentences.我假设您正在尝试为您的输入句子创建一个密集的向量表示。
See if below code helps.看看下面的代码是否有帮助。
sentences = ["Hi how are you?", "How are you doing"]
# Step 1: Create vocabulary - a set of distinct tokens from your input sentences
vocab = set()
for sentence in sentences:
tokens = sentence.split()
for token in tokens:
vocab.add(token)
# Step 2: Create a map (token: ID)
vocab_map = {}
for i, token in enumerate(sorted(vocab)): # sorted lexicographically for reproducibility
vocab_map[token] = i
# encode the sentences using the map you created in the previous step
for sentence in sentences:
encoded_sentence = []
tokens = sentence.split()
for token in tokens:
encoded_sentence.append(str(vocab_map[token]))
print(' '.join(encoded_sentence))
Running the above code should get output the following:运行上面的代码应该得到以下输出:
0 4 2 6
1 2 5 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.