![](/img/trans.png)
[英]How to convert strings to numbers, when taking data from .csv to .xlsx, using openpyxl
[英]Python Openpyxl convert CSV to XLSX & removing “$ ,” from cells containing numbers
我必須將由第三方生成的 csv 文件讀取到 XLSX 文件中,該文件包含字符串、整數和價格(有時帶有 $ 符號)的混合。 這是存儲在 csv 文件 a_test_f.csv 中的示例數據,我得到了:
ColA,ColB
1,$11.00
2,22
3,"$1,000.56"
4,44
這是我寫的代碼。 我的問題是,這是執行此轉換的最有效方式嗎? 是否有替代方法可以使用更少的處理能力/memory? 這一點尤其重要,因為真正的 csv 文件將包含數千條記錄和數百列,並且每天必須執行數萬次轉換操作。
import csv
import openpyxl
#
# Convert the data in csv file format that contains a mix of
# strings, integers and dollar amounts into xlsx file format
#
csvfile = 'a_test_f.csv'
xlsxfile = 'new_xlsx_f.xlsx'
wb = openpyxl.Workbook()
ws = wb.active
# remove $ and , from numbers
class Clean:
def __init__(self, data=''):
self.__obj = data
def __repr__(self):
return f"{self.__obj}"
def getData(self):
return self.__obj
def dollar(self):
try:
return Clean(data=self.__obj.replace('$',''))
except TypeError as err:
print(err)
def comma(self):
try:
return Clean(data=self.__obj.replace(',',''))
except TypeError as err:
print(err)
def digit(self):
try:
float(self.__obj)
return True
except ValueError:
return False
with open(csvfile) as f:
reader = csv.reader(f, delimiter=',', quotechar='"')
row_count=1
for row in reader:
for i in range(len(row)):
if Clean(data=row[i]).dollar().comma().digit():
content = float(repr(Clean(data=row[i]).dollar().comma()))
else:
content = row[i]
ws.cell(row=row_count,column=i+1).value = content
row_count +=1
wb.save(xlsxfile)
print('Finished!')
按照 Charlie 的建議,我使用 Functions 而不是 Class 重寫了轉換,然后嘗試使用 Class 和 Functions 方法處理 csv 文件中的一百萬個項目。 結果:
函數取勝。 謝謝查理!
Function方法如下:
import csv
import openpyxl
#
# Convert the data in csv file format that contains a mix of
# strings, integers and dollar amounts into xlsx file format
#
csvfile = 'large_test_export.csv'
xlsxfile = 'new_xlsx_f.xlsx'
wb = openpyxl.Workbook()
ws = wb.active
# remove $ and , from numbers
def strip_stuff(a_string):
try:
temp = a_string.replace(',','')
except TypeError as err:
print(err)
try:
temp2 = temp.replace('$','')
except TypeError as err:
print(err)
try:
temp3 = float(temp2)
return temp3
except ValueError as err:
return temp2
def is_number(b_string):
temp = strip_stuff(b_string)
try:
float (temp)
return True
except ValueError:
return False
with open(csvfile) as f:
reader = csv.reader(f, delimiter=',', quotechar='"')
row_count=1
for row in reader:
for i in range(len(row)):
if is_number(row[i]):
content = strip_stuff(row[i])
else:
content = row[i]
ws.cell(row=row_count,column=i+1).value = content
row_count +=1
wb.save(xlsxfile)
print('Finished!')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.