[英]Remove non-ascii characters from CSV using pandas
我正在查詢 SQL 服務器數據庫中的表並使用 pandas 導出到 CSV:
import pandas as pd
df = pd.read_sql_query(sql, conn)
df.to_csv(csvFile, index=False)
導出 CSV 時有沒有辦法刪除非 ASCII 字符?
您可以讀入文件,然后使用正則表達式去除非 ASCII 字符:
df.to_csv(csvFile, index=False)
with open(csvFile) as f:
new_text = re.sub(r'[^\x00-\x7F]+', '', f.read())
with open(csvFile, 'w') as f:
f.write(new_text)
這就是我遇到的情況。 這對我有用:
import re
regex = re.compile(r'[^\x00-\x7F]+') #regex that matches non-ascii characters
with open(csvFile, 'r') as infile, open('myfile.csv', 'w') as outfile:
for line in infile: #keep looping until we hit EOF (meaning there's no more lines to read)
outfile.write(regex.sub('', line)) #write the current line in the input file to the output file, but if it matches our regex then we replace it with nothing (so it will get removed)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.