简体   繁体   English

使用 pandas 从 CSV 中删除非 ascii 字符

[英]Remove non-ascii characters from CSV using pandas

I'm querying a table in a SQL Server database and exporting out to a CSV using pandas:我正在查询 SQL 服务器数据库中的表并使用 pandas 导出到 CSV:

import pandas as pd

df = pd.read_sql_query(sql, conn)
df.to_csv(csvFile, index=False)

Is there a way to remove non-ascii characters when exporting the CSV?导出 CSV 时有没有办法删除非 ASCII 字符?

You can read in the file and then use a regular expression to strip out non-ASCII characters:您可以读入文件,然后使用正则表达式去除非 ASCII 字符:

df.to_csv(csvFile, index=False)

with open(csvFile) as f:
    new_text = re.sub(r'[^\x00-\x7F]+', '', f.read())

with open(csvFile, 'w') as f:
    f.write(new_text)

This was the case I ran into.这就是我遇到的情况。 Here's what worked for me:这对我有用:

import re
regex = re.compile(r'[^\x00-\x7F]+') #regex that matches non-ascii characters
with open(csvFile, 'r') as infile, open('myfile.csv', 'w') as outfile:
    for line in infile:  #keep looping until we hit EOF (meaning there's no more lines to read)
        outfile.write(regex.sub('', line)) #write the current line in the input file to the output file, but if it matches our regex then we replace it with nothing (so it will get removed)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM