使用 pandas 從 CSV 中刪除非 ascii 字符

Question

我正在查詢 SQL 服務器數據庫中的表並使用 pandas 導出到 CSV：

import pandas as pd

df = pd.read_sql_query(sql, conn)
df.to_csv(csvFile, index=False)

導出 CSV 時有沒有辦法刪除非 ASCII 字符？

Answer 1

您可以讀入文件，然后使用正則表達式去除非 ASCII 字符：

df.to_csv(csvFile, index=False)

with open(csvFile) as f:
    new_text = re.sub(r'[^\x00-\x7F]+', '', f.read())

with open(csvFile, 'w') as f:
    f.write(new_text)

Answer 2

這就是我遇到的情況。 這對我有用：

import re
regex = re.compile(r'[^\x00-\x7F]+') #regex that matches non-ascii characters
with open(csvFile, 'r') as infile, open('myfile.csv', 'w') as outfile:
    for line in infile:  #keep looping until we hit EOF (meaning there's no more lines to read)
        outfile.write(regex.sub('', line)) #write the current line in the input file to the output file, but if it matches our regex then we replace it with nothing (so it will get removed)

使用 pandas 從 CSV 中刪除非 ascii 字符

問題描述

2 個解決方案

解決方案1
1 已采納 2022-01-27 23:21:40

解決方案2
0 2022-01-27 23:28:31

使用 pandas 從 CSV 中刪除非 ascii 字符

問題描述

2 個解決方案

解決方案1 1 已采納 2022-01-27 23:21:40

解決方案2 0 2022-01-27 23:28:31

解決方案1
1 已采納 2022-01-27 23:21:40

解決方案2
0 2022-01-27 23:28:31