运行脚本时特定于工作站的 Python 错误

Question

I am getting an error on one workstation when running a Python script.运行 Python 脚本时，我在一个工作站上遇到错误。 The script runs fine on VMs and my workstation.该脚本在虚拟机和我的工作站上运行良好。

pip list Shows packages are the same pip list显示包是一样的
Workstations are all using Python 3.10.4 64bit工作站都使用 Python 3.10.4 64bit
This is the only workstation throwing this error.这是唯一出现此错误的工作站。

It might be a memory issue, but the workstation has 2x4Gb RAM.这可能是内存问题，但工作站有 2x4Gb RAM。 I tried to chunk it out, but that did not work either.我试图把它分块，但这也不起作用。 The file is barely 1Mb.该文件只有 1Mb。

As troubleshooting, I cut the file to just 500 rows, and it ran fine.作为故障排除，我将文件剪切为 500 行，并且运行良好。 When I tried 1000 rows out of the 2500 rows in the file, it gave the same error.当我尝试文件中 2500 行中的 1000 行时，它给出了相同的错误。 Interestingly the workstation cannot run the script with even just one row now.有趣的是，工作站现在连一行都无法运行脚本。

Including error_bad_lines=False , iterator=True , chunksize= , low_memory=False have all not worked.包括error_bad_lines=False 、 iterator=True 、 chunksize= 、 low_memory=False都不起作用。

What is causing this error?是什么导致了这个错误？ Why did it run just fine using a few rows, but now not even with one row?为什么它使用几行运行得很好，但现在甚至没有一行？

Here is the Traceback:这是回溯：

Traceback (most recent call last):
  File "c:\Users\script.py", line 5, in <module>
    data = pd.read_csv("C:/Path/file.csv", encoding='latin-1' )
  File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 581, in _read
    return parser.read(nrows)
  File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1250, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 225, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas\_libs\parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas\_libs\parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas\_libs\parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 4

Here is the script:这是脚本：

# Import raw data
data = pd.read_csv("C:/Users/Script.csv", encoding='latin-1' )

# Create array to track failed cases.
data['Test Case Failed']= ''
data = data.replace(np.nan,'')
data.insert(0, 'ID', range(0, len(data)))

# Testcase 1
data_1 = data[(data['FirstName'] == data['SRFirstName'])]
ids = data_1.index.tolist()
for i in ids:
  data.at[i,'Test Case Failed']+=', 1'

# There are 15 more test cases that preform similar tasks

# Total cases
failed = data[(data['Test Case Failed'] != '')]
passed = data[(data['Test Case Failed'] == '')]
failed['Test Case Failed'] =failed['Test Case Failed'].str[1:]
failed = failed[(failed['Test Case Failed'] != '')]

# Clean up
del failed["ID"]
del passed["ID"]

# Print results 
failed['Test Case Failed'].value_counts()
print("There was a total of",data.shape[0], "rows.", "There was" ,data.shape[0] - failed.shape[0], "rows passed and" ,failed.shape[0], "rows failed at least one test case")

# Drop unwanted columns 
redata = passed.drop(columns=['ConsCodeImpID', 'ImportID', 'Suff1', 'SRSuff2', 'Inactive', 
'AddrRegion','AddrImpID', 'AddrImpID', 'AddrImpID.2', 'AddrImpID.1', 'PhoneAddrImpID',
'PhoneAddrImpID.1', 'PhoneImpID', 'PhoneAddrImpID', 'PhoneImpID', 'PhoneType.1', 'DateTo', 
'SecondID', 'Test Case Failed', 'PhoneImpID.1'])

# Clean address  
redata['AddrLines'] = redata['AddrLines'].str.replace('Apartment ','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('Apt\\.','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('APT','Apt ',regex=True)
redata['AddrLines'] = redata['AddrLines'].str.replace('nApt','Apt ',regex=True)
#There's about 100 more rows of address clean up

# Output edited dropped columns  
redata.to_csv("C:/Users/cleandata.csv", index = False)
# Output failed rows
failed.to_csv("C:/Users/Failed.csv", index = False)
# Output passed rows 
passed.to_csv("C:/Users/Passed.csv", index = False)

Answer 1

The workstation was corrupting the file, despite never opening it before running the script.工作站损坏了文件，尽管在运行脚本之前从未打开它。 I repaired the file and it worked.我修复了文件，它工作。 After reinstalling Excel, I no longer had to repair the file and could run the script as normal.重新安装 Excel 后，我不再需要修复文件，并且可以正常运行脚本。

Click File > Open.单击文件 > 打开。
Click the location and folder that contains the corrupted workbook.单击包含损坏的工作簿的位置和文件夹。
In the Open dialog box, select the corrupted workbook.在“打开”对话框中，选择损坏的工作簿。
Click the arrow next to the Open button, and then click Open and单击打开按钮旁边的箭头，然后单击打开并
Repair.修理。
Open and repair command打开和修复命令
To recover as much of the workbook data as possible, pick Repair.要尽可能多地恢复工作簿数据，请选择修复。
If Repair isn't able to recover your data, pick Extract Data to extract values and formulas from the workbook.如果修复无法恢复您的数据，请选择提取数据以从工作簿中提取值和公式。

运行脚本时特定于工作站的 Python 错误

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-06-10 14:31:02

运行脚本时特定于工作站的 Python 错误

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-06-10 14:31:02

解决方案1
0 已采纳 2022-06-10 14:31:02