[英]Strip white spaces from CSV file
我需要从阅读的 CSV 文件中删除空格
import csv
aList=[]
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
aList.append(row)
# I need to strip the extra white space from each string in the row
return(aList)
还有嵌入的格式参数:skipinitialspace(默认为false) http://docs.python.org/2/library/csv.html#csv-fmt-params
aList=[]
with open(self.filename, 'r') as f:
reader = csv.reader(f, skipinitialspace=False,delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
aList.append(row)
return(aList)
就我而言,在使用csv.DictReader
时,我只关心从字段名称(又名列标题,又名字典键)中去除空格。
创建一个基于csv.DictReader
的类,并覆盖fieldnames
属性以从每个字段名称(又名列标题,又名字典键)中csv.DictReader
空格。
为此,获取字段名的常规列表,然后迭代它,同时创建一个新列表,从每个字段名中删除空格,并将底层_fieldnames
属性设置为这个新列表。
import csv
class DictReaderStrip(csv.DictReader):
@property
def fieldnames(self):
if self._fieldnames is None:
# Initialize self._fieldnames
# Note: DictReader is an old-style class, so can't use super()
csv.DictReader.fieldnames.fget(self)
if self._fieldnames is not None:
self._fieldnames = [name.strip() for name in self._fieldnames]
return self._fieldnames
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
return [[x.strip() for x in row] for row in reader]
你可以做:
aList.append([element.strip() for element in row])
您可以在文件周围创建一个包装对象,在 CSV 阅读器看到它们之前去除空格。 这样,您甚至可以将 csv 文件与 cvs.DictReader 一起使用。
import re
class CSVSpaceStripper:
def __init__(self, filename):
self.fh = open(filename, "r")
self.surroundingWhiteSpace = re.compile("\s*;\s*")
self.leadingOrTrailingWhiteSpace = re.compile("^\s*|\s*$")
def close(self):
self.fh.close()
self.fh = None
def __iter__(self):
return self
def next(self):
line = self.fh.next()
line = self.surroundingWhiteSpace.sub(";", line)
line = self.leadingOrTrailingWhiteSpace.sub("", line)
return line
然后像这样使用它:
o = csv.reader(CSVSpaceStripper(filename), delimiter=";")
o = csv.DictReader(CSVSpaceStripper(filename), delimiter=";")
我硬编码了";"
成为分隔符。 将代码概括为任何分隔符留给读者作为练习。
解析后格式化单元格的最节省内存的方法是通过generators 。 就像是:
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
yield (cell.strip() for cell in row)
但是将它移到一个函数中可能是值得的,您可以使用它来继续调整并避免即将到来的迭代。 例如:
nulls = {'NULL', 'null', 'None', ''}
def clean(reader):
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield clean(row)
或者它可以用来分解一个类:
def factory(reader):
fields = next(reader)
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield dict(zip(fields, clean(row)))
使用 Pandas 读取 CSV(或 Excel 文件)并使用此自定义函数对其进行修剪。
#Definition for strippping whitespace
def trim(dataset):
trim = lambda x: x.strip() if type(x) is str else x
return dataset.applymap(trim)
您现在可以像这样将 trim(CSV/Excel) 应用到您的代码中(作为循环的一部分等)
dataset = trim(pd.read_csv(dataset))
dataset = trim(pd.read_excel(dataset))
这是适用于 Python3 的 Daniel Kullmann 出色的解决方案:
import re
class CSVSpaceStripper:
"""strip whitespaces around delimiters in the file
NB has hardcoded delimiter ";"
"""
def __init__(self, filename):
self.fh = open(filename, "r")
self.surroundingWhiteSpace = re.compile(r"\s*;\s*")
self.leadingOrTrailingWhiteSpace = re.compile(r"^\s*|\s*$")
def close(self):
self.fh.close()
self.fh = None
def __iter__(self):
return self
def __next__(self):
line = self.fh.readline()
line = self.surroundingWhiteSpace.sub(";", line)
line = self.leadingOrTrailingWhiteSpace.sub("", line)
return line
我想出了一个非常简单的解决方案:
import csv
with open('filename.csv') as f:
reader = csv.DictReader(f)
rows = [ { k.strip(): v.strip() for k,v in row.items() } for row in reader ]
以下代码可能对您有所帮助:
import pandas as pd
aList = pd.read_csv(r'filename.csv', sep='\s*,\s*', engine='python')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.