簡體   English   中英

使用 pandas 從 CSV 中刪除非 ascii 字符

[英]Remove non-ascii characters from CSV using pandas

我正在查詢 SQL 服務器數據庫中的表並使用 pandas 導出到 CSV:

import pandas as pd

df = pd.read_sql_query(sql, conn)
df.to_csv(csvFile, index=False)

導出 CSV 時有沒有辦法刪除非 ASCII 字符?

您可以讀入文件,然后使用正則表達式去除非 ASCII 字符:

df.to_csv(csvFile, index=False)

with open(csvFile) as f:
    new_text = re.sub(r'[^\x00-\x7F]+', '', f.read())

with open(csvFile, 'w') as f:
    f.write(new_text)

這就是我遇到的情況。 這對我有用:

import re
regex = re.compile(r'[^\x00-\x7F]+') #regex that matches non-ascii characters
with open(csvFile, 'r') as infile, open('myfile.csv', 'w') as outfile:
    for line in infile:  #keep looping until we hit EOF (meaning there's no more lines to read)
        outfile.write(regex.sub('', line)) #write the current line in the input file to the output file, but if it matches our regex then we replace it with nothing (so it will get removed)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM