从文件中分离英文文本和非英文文本

Question

I have a.csv file and I want to separate Non-English Text and English Text in two different files.我有一个 .csv 文件，我想将非英文文本和英文文本分隔在两个不同的文件中。 Below is the code, I tried:下面是代码，我试过了：

  import string
  def isEnglish(s):
      return s.translate(None, string.punctuation).isalnum()
  file=open('File1.csv','r',encoding='UTF-8')
  outfile1=open('Eng.csv','w', encoding='utf-8')
  outfile2=open('Noneng.csv','w', encoding='utf-8')
  for line in file.readlines():
       r = isEnglish(line)
       if r:
          outfile1.write(line+"\n")
       else:
          outfile2.write(line+"\n")

The code is not producing the desired result.该代码没有产生预期的结果。 There is repetitive English text in both the files.两个文件中都有重复的英文文本。 I have attached a snapshot of one output file.我附上了一个 output 文件的快照。

Answer 1

You neglected to mention the code produces this result:您忽略了提及代码会产生以下结果：

TypeError: translate() takes exactly one argument (2 given)

Would you please Read The Fine Manual: https://docs.python.org/3/library/stdtypes.html#str.translate请您阅读精美手册： https://docs.python.org/3/library/stdtypes.html#str.translate

The documentation offers a pretty big hint that you should call str.maketrans(... ) to create the desired translation map.该文档提供了一个相当大的提示，您应该调用str.maketrans(... )来创建所需的翻译 map。 This will help you identify input strings that are strictly alphanumeric.这将帮助您识别严格为字母数字的输入字符串。

translation_table = str.maketrans('', '', string.punctuation)

从文件中分离英文文本和非英文文本

问题描述

1 个解决方案

解决方案1
0 2019-11-03 15:53:58

从文件中分离英文文本和非英文文本

问题描述

1 个解决方案

解决方案1 0 2019-11-03 15:53:58

解决方案1
0 2019-11-03 15:53:58