唯一單詞字典刪除特殊字符和數字

Question

我想用一本書做一本字典，不幸的是我有一個問題

import re

with open('vechny.txt', encoding='utf-8') as fname:
    text = fname.read()
    lst = list(set(text.split()))
    str1 = ' '.join(str(e) for e in lst)
    print(str1, file=open("1000.txt", "a", encoding='utf-8'))



in_file = open("1000.txt", "r", encoding='utf-8')
lines = in_file.read().split(' ')
in_file.close()

out_file = open("file.txt", "w", encoding='utf-8')
out_file.write("\n".join(lines))
out_file.close()

此腳本運行良好，但需要刪除特殊字符

, .-, 等...來自純文本

例如有單詞 Hay，split 將其視為一個單詞，因此不會將其刪除

如何制作文字

input
Hay, hello,% lost. 15 čas řad
output im search is
hay hello lost cas rad

Answer 1

那這個呢？

import re
str1 = '#@-/abcüšščřžý'
r = re.findall(r'\b\d*[^\W\d_][^\W_]*\b', str1, re.UNICODE)
str2 = ''.join(r)
print(str2)

Answer 2

嘗試這個：

import re
re.sub('[^A-Za-z0-9]+', ' ', 'Hay, hello,% lost. 15')

讓我知道是否可以！

Answer 3

from unidecode import unidecode
import random
import re

random = (random.randint(1000, 2000))

n = (input("jmenosouboru:"))

with open(""+str(n)+".txt", encoding='utf-8') as fname:
    text = fname.read()
    r = re.findall(r'\b\d*[^\W\d_][^\W_]*\b', text, re.UNICODE)
    str2 = ' '.join(r)
    uni=(unidecode(str2))
    lst = list(set(uni.split()))
    str1 = ' '.join(str(e) for e in lst)
    lines = str1.split(' ')
    text1 = ("\n".join(lines))
    text2 = ''.join(filter(lambda x: not x.isdigit(), text1))
    print(text2, file=open(""+str(random)+"-.txt", "a", encoding='utf-8'))
    print("done")

唯一單詞字典刪除特殊字符和數字

問題描述

3 個解決方案

解決方案1
1 已采納 2022-05-17 03:31:46

解決方案2
0 2022-05-17 02:39:24

解決方案3
0 2022-05-17 03:43:07

唯一單詞字典刪除特殊字符和數字

問題描述

3 個解決方案

解決方案1 1 已采納 2022-05-17 03:31:46

解決方案2 0 2022-05-17 02:39:24

解決方案3 0 2022-05-17 03:43:07

解決方案1
1 已采納 2022-05-17 03:31:46

解決方案2
0 2022-05-17 02:39:24

解決方案3
0 2022-05-17 03:43:07