简体   繁体   English

我有 200 个印地语文本文件。 想要删除特殊字符的空格并在python中找到find bigram和trigram

[英]i have 200 text file in hindi. want to remove white space the special character and find the find bigram and trigram in python

import os

dir=os.getcwd()
print(dir)
dir1=os.path.join(dir,"test")
filename=os.listdir(dir1)
bad_chars = [';', ':', '!', "*","#","%"]
for i in filename:
    filepath=os.path.join(dir1,i)  #  the path
    file=open(filepath,"r",encoding="utf8") #open first text file
    read_=file.read()
    fields = read_.split(" ")
    print(fields)
    file1=open(filepath,"w",encoding="utf8")
    file2=open(filepath,"a",encoding="utf8")
    for j in range(len(fields)):        
        for p in bad_chars :
            fields[j].replace(i,' ')
            file2.write(fields[j])
            print ("Resultant list is : " , fields[j])
file.close()
file1.close()
file2.close()

I am trying to remove special character fro all the 200 text file我正在尝试从所有 200 个文本文件中删除特殊字符

this is the code for bigram which I found这是我找到的 bigram 代码

example my name is eshan.例如我的名字是峨山。 output my, name occurs 1 name,is occurs 1 is, advance occurs 1 occurance can be more then 1 according to text输出 my, name 出现 1 name,is 出现 1 is, Advance 出现 1 次出现可以超过 1 根据文本

Try this way:试试这个方法:

for file in filename:
    filepath=os.path.join(dir1,file)


    with open('inp.txt','r+') as f:
      texts = f.read()
      for c in bad_chars:
        texts=texts.replace(c,' ')

    #write to the file
    with open('inp.txt','w') as f:
      f.write(texts)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python中相交二元三元组 - Intersect bigram trigram in python 删除特殊字符前后的空白并加入python - Remove White space before and after a special character and join them python Python:查找一个bigram的词汇 - Python: Find vocabulary of a bigram 我想用空格替换特殊字符 - I want to replace a special character with a space 如何在 python 中使用 nltk 找到特定的二元语法? - How can I find a specific bigram using nltk in python? 如何从 Python 中的文本文件中删除第一个空格? - How do i remove the very first white space from my text file in Python? 将三元组,双胞胎和非语言与文本相匹配; 如果unigram或bigram是已经匹配的trigram的子串,则传递; 蟒蛇 - Match trigrams, bigrams, and unigrams to a text; if unigram or bigram a substring of already matched trigram, pass; python 我有一个包含许多.tar.gz文件的文件夹。 在python中,我想进入每个文件解压缩或压缩,并找到具有要提取的字符串的文本文件? - I have a folder with many .tar.gz files. In python I want to go into each file unzip or compress and find text file that has string I want to extract? Python:查找没有空格的字符串中的单词 - Python : find words in string without white space 使用 NLTK 查找三元组 - find trigram using NLTK
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM