简体   繁体   English

将 CSV 文件转换为 Excel 后,整数存储为字符串 - 如何将它们转换回来?

[英]After converting a CSV file to Excel, integers are stored as strings - how to convert them back?

In this project I've converted a csv file to an xls file and a txt file to an xls file.在这个项目中,我将 csv 文件转换为 xls 文件,将 txt 文件转换为 xls 文件。 The objective is to then compare both xls files for differences and print out any differences to a third excel file.目标是然后比较两个 xls 文件的差异并将任何差异打印到第三个 excel 文件。

However, when the differences are printed they include any entry with an integer above 999, as any integer from my converted csv file is treated as a string instead of an integer.但是,当打印差异时,它们包括整数大于 999 的任何条目,因为我转换后的 csv 文件中的任何整数都被视为字符串而不是整数。 Therefore it treats a value such as 1,200 (in my converted xls file) differently from 1200 (in my converted txt file) due to the comma in the converted csv excel file.因此,由于转换后的 csv excel 文件中的逗号,它将 1,200(在我转换后的 xls 文件中)与 1200(在我转换后的 txt 文件中)等值不同。

My question is: Is there a way to convert the string interpreted integers, back to being interpreted as integers?我的问题是:有没有办法将字符串解释的整数转换回被解释为整数? Otherwise, is there a way to delete all commas from my xls files?否则,有没有办法从我的 xls 文件中删除所有逗号? I have tried the usual dataframe.replace methodology and it is ineffective.我已经尝试了通常的 dataframe.replace 方法,但它是无效的。

Below is my code:下面是我的代码:

#import required libraries
import datetime
import xlrd
import pandas as pd

#define the time_handle function to name the outputted excel files
time_handle = datetime.datetime.now().strftime("%Y%m%d_%H%M")

#identify XM1 file paths (for both csv origin and excel destination)
XM1_csv = r"filepath"
XM2_excel = r"filepath" + time_handle + ".xlsx"

#identify XM2 file paths (for both txt origin and excel destination)
XM2_txt = r"filepath"
XM2_excel = r"filepath" + time_handle + ".xlsx"

#remove commas from XM1 excel - failed attempts
#XM1_excel = [col.replace(',', '') for col in XM1_excel]
#XM1_excel = XM1_excel.replace(",", "")
#for line in XM1_excel:
        #XM1_excel.write(line.replace(",", ""))

#remove commas from XM1 CSV - failed attempts
#XM1_csv = [col.replace(',', '') for col in XM1_csv]
#XM1_csv = XM1_csv.replace(",", "")
#for line in XM1_csv:
        #XM1_excel.write(line.replace(",", ""))

#convert the csv XM1 file to an excel file, in the same folder
pd.read_csv(XM1_csv).to_excel(XM1_excel)

#convert the txt XM2 file to an excel file in the same folder
pd.read_csv(XM2_txt, sep="|").to_excel(XM2_excel)



#confirm XM1 filepath
filepath_XM1 = XM1_excel

#confirm XM2 filepath
filepath_XM2 = XM2_excel
#read relevant columns from the excel files
df1 = pd.read_excel(filepath_XM2, sheetname="Sheet1", parse_cols= "H, J, M, U")
df2 = pd.read_excel(filepath_XM1, sheetname="Sheet1", parse_cols= "C, E, G, K")

#remove all commas from XM1 - failed attempts
#df2 = [col.replace(',', '') for col in df2]
#df2 = df2.replace(",", "")
#for line in df2:
        #df2.write(line.replace(",", ""))

#merge the columns from both excel files into one column each respectively
df4 = df1["Exchange Code"] + df1["Product Type"] + df1["Product Description"] + df1["Quantity"].apply(str)
df5 = df2["Exchange"] + df2["Product Type"] + df2["Product Description"] + df2["Quantity"].apply(str)

#concatenate both columns from each excel file, to make one big column containing all the data
df = pd.concat([df4, df5])

#remove all whitespace from each row of the column of data
df=df.str.strip()
df=["".join(x.split()) for x in df]

#convert the data to a dataframe from a series
df = pd.DataFrame({'Value': df})

#remove any duplicates
df.drop_duplicates(subset=None, keep=False, inplace=True)

#print to the console just as a visual aid
print(df)
#output_path = r"filepath"
#print the erroneous entries to an excel file
df.to_excel("XM1_XM2Comparison" + time_handle + ".xls")

Also, I realize the XM1 and XM2 file names with regards to df1 and df2 is a bit confusing, but I simply renamed my files.另外,我意识到关于 df1 和 df2 的 XM1 和 XM2 文件名有点令人困惑,但我只是重命名了我的文件。 It makes sense in terms of the files and where they belong in the code!就文件及其在代码中的位置而言,这是有意义的!

Thank You谢谢你

You can try an argument called converters on the read-end of the dataframe where you can specify the datatype.您可以在数据帧的读取端尝试一个名为converters的参数,您可以在其中指定数据类型。 Example:例子:

df= pd.read_excel(file, sheetname=YOUR_SHEET_HERE, converters={'FIELD_NAME': str})

converters is both in read_csv and read_excel convertersread_csvread_excel

I actually solved this issue with a simple fix for future reference.我实际上通过一个简单的修复解决了这个问题,以供将来参考。 when reading the csv using pd.read_csv, I added the thousands method so it looks like this:使用 pd.read_csv 读取 csv 时,我添加了数千个方法,因此它看起来像这样:

pd.read_csv(XM1, thousands = ",").to_excel(XM1_excel)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在不将整数列表转换为字符串的情况下将 csv 文件中的整数列表读入 python? - How to read list of integers from a csv file into python without converting them to strings? 如何将文本文件中的字符串转换为整数并将它们存储在另一个文件中? - How to convert strings in a text file into integers and store them in another file? 将字符串转换为整数,然后在带有字符串的列表中对其进行排序 - Converting strings to integers then ordering them in an list with strings 如何在 CSV 文件 Python 中将整数存储为字符串 - How to store integers as strings in CSV file Python 将 csv 文件读入列表并将字符串转换为整数 Python - Read csv file into list and convert the strings into integers Python 如何将嵌套列表字符串转换为整数,然后在python 3中对其进行排序? - How to convert nested list strings to integers then sort them in python 3? 如何将字符串转换为整数? - How to convert strings into integers? 是否有一个函数来规范化字符串并将它们转换为整数/浮点数? - Is there a function to normalize strings and convert them to integers/floats? 如何将字节作为字节字符串而不是整数写入csv文件? - How to write bytes to csv file as byte strings not integers? 如何在Python中将字符串转换为整数 - How to convert strings to integers in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM