[英]Python error that is workstation specific when running script
I am getting an error on one workstation when running a Python script.运行 Python 脚本时,我在一个工作站上遇到错误。 The script runs fine on VMs and my workstation.该脚本在虚拟机和我的工作站上运行良好。
pip list
Shows packages are the same pip list
显示包是一样的It might be a memory issue, but the workstation has 2x4Gb RAM.这可能是内存问题,但工作站有 2x4Gb RAM。 I tried to chunk it out, but that did not work either.我试图把它分块,但这也不起作用。 The file is barely 1Mb.该文件只有 1Mb。
As troubleshooting, I cut the file to just 500 rows, and it ran fine.作为故障排除,我将文件剪切为 500 行,并且运行良好。 When I tried 1000 rows out of the 2500 rows in the file, it gave the same error.当我尝试文件中 2500 行中的 1000 行时,它给出了相同的错误。 Interestingly the workstation cannot run the script with even just one row now.有趣的是,工作站现在连一行都无法运行脚本。
Including error_bad_lines=False
, iterator=True
, chunksize=
, low_memory=False
have all not worked.包括error_bad_lines=False
、 iterator=True
、 chunksize=
、 low_memory=False
都不起作用。
What is causing this error?是什么导致了这个错误? Why did it run just fine using a few rows, but now not even with one row?为什么它使用几行运行得很好,但现在甚至没有一行?
Here is the Traceback:这是回溯:
Traceback (most recent call last):
File "c:\Users\script.py", line 5, in <module>
data = pd.read_csv("C:/Path/file.csv", encoding='latin-1' )
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 581, in _read
return parser.read(nrows)
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1250, in read
index, columns, col_dict = self._engine.read(nrows)
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 225, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 4
Here is the script:这是脚本:
# Import raw data
data = pd.read_csv("C:/Users/Script.csv", encoding='latin-1' )
# Create array to track failed cases.
data['Test Case Failed']= ''
data = data.replace(np.nan,'')
data.insert(0, 'ID', range(0, len(data)))
# Testcase 1
data_1 = data[(data['FirstName'] == data['SRFirstName'])]
ids = data_1.index.tolist()
for i in ids:
data.at[i,'Test Case Failed']+=', 1'
# There are 15 more test cases that preform similar tasks
# Total cases
failed = data[(data['Test Case Failed'] != '')]
passed = data[(data['Test Case Failed'] == '')]
failed['Test Case Failed'] =failed['Test Case Failed'].str[1:]
failed = failed[(failed['Test Case Failed'] != '')]
# Clean up
del failed["ID"]
del passed["ID"]
# Print results
failed['Test Case Failed'].value_counts()
print("There was a total of",data.shape[0], "rows.", "There was" ,data.shape[0] - failed.shape[0], "rows passed and" ,failed.shape[0], "rows failed at least one test case")
# Drop unwanted columns
redata = passed.drop(columns=['ConsCodeImpID', 'ImportID', 'Suff1', 'SRSuff2', 'Inactive',
'AddrRegion','AddrImpID', 'AddrImpID', 'AddrImpID.2', 'AddrImpID.1', 'PhoneAddrImpID',
'PhoneAddrImpID.1', 'PhoneImpID', 'PhoneAddrImpID', 'PhoneImpID', 'PhoneType.1', 'DateTo',
'SecondID', 'Test Case Failed', 'PhoneImpID.1'])
# Clean address
redata['AddrLines'] = redata['AddrLines'].str.replace('Apartment ','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('Apt\\.','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('APT','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('nApt','Apt ',regex=True)
#There's about 100 more rows of address clean up
# Output edited dropped columns
redata.to_csv("C:/Users/cleandata.csv", index = False)
# Output failed rows
failed.to_csv("C:/Users/Failed.csv", index = False)
# Output passed rows
passed.to_csv("C:/Users/Passed.csv", index = False)
The workstation was corrupting the file, despite never opening it before running the script.工作站损坏了文件,尽管在运行脚本之前从未打开它。 I repaired the file and it worked.我修复了文件,它工作。 After reinstalling Excel, I no longer had to repair the file and could run the script as normal.重新安装 Excel 后,我不再需要修复文件,并且可以正常运行脚本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.