I have a csv file that contains English and Chinese, how can I separate them and then save the ones that contain Chinese as "Chinese" and those that don't contain Chinese as "English", I found a code to differentiate but I don't know how to save them.<\/i>
def is_chinese(string):
for ch in string:
if u'\u4e00' <= ch <= u'\u9fff':
return True
return False
ret1 = is_chinese("a中国aaa")
print(ret1)
ret2 = is_chinese("123")
print(ret2)
This code output the lines that contains the chinese character and save those into a file called "detected.txt"<\/i>
import re
characters=[]
i = 0
with open('01.csv','r',encoding='utf-8') as file: #Open CSV file
with open('detected.txt', 'r+') as f: #Open file to write
for line in file.readlines(): #Read each line of CSV file
if re.findall(r'[\u4e00-\u9fff]+', line) == []: #If there is no Chinese character in the line
pass
else:
characters.append(re.findall(r'[\u4e00-\u9fff]+', line)) #Append the Chinese character to the list
if str(characters[i][0]) in line: #If the Chinese character is in the line
f.write(line) #Append the line to the file
i+=1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.