Say, I have a function to run multiple data frames in a list. Like this,
listdF = [os.path.join(os.sep,path,x) for x in os.listdir(path) if x.endswith('.csv')]
def corre_arrys(listdF):
data = []
for files in listdF:
df = pd.read_csv(files,sep='\t',header=0,engine='python')
#do something
return(df)
When I try to run the above function as it is, there is no error. It prints out the output I needed. However, when I try to run it using multiprocessing
like follows,
from multiprocessing import Pool
NUM_PROCS = 8
pool = Pool(processes=NUM_PROCS)
allDfs = pool.map(corre_arrys,listdF)
It is throwing the following error message,
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/alva/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/alva/anaconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "<ipython-input-42-e4b97b52ffff>", line 4, in corre_arrys
df = pd.read_csv(files,sep='\t',header=0,engine='python')
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1126, in _make_engine
self._engine = klass(self.f, **self.options)
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 2269, in __init__
memory_map=self.memory_map,
File "/home/alva/anaconda3/lib/python3.7/site-packages/pandas/io/common.py", line 431, in get_handle
f = open(path_or_buf, mode, errors="replace", newline="")
IsADirectoryError: [Errno 21] Is a directory: '/'
"""
The above exception was the direct cause of the following exception:
IsADirectoryError Traceback (most recent call last)
<ipython-input-46-4971753cdf30> in <module>
4 NUM_PROCS = 8
5 pool = Pool(processes=NUM_PROCS)
----> 6 allDfs = pool.map(corre_arrys,listdF)
~/anaconda3/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
266 in a list that is returned.
267 '''
--> 268 return self._map_async(func, iterable, mapstar, chunksize).get()
269
270 def starmap(self, func, iterable, chunksize=None):
~/anaconda3/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):
IsADirectoryError: [Errno 21] Is a directory: '/'
The listDF
looks like the following, which has both paths and files.
['/path/scripts/pc_2_lc_1_T.csv',
'/path/scripts/pc_2_lc_2_T.csv',
'/path/scripts/pc_1_lc_1_T.csv',
'/path/scripts/pc_1_lc_2_T.csv']
I am not able to understand where is the exact problem.
Any help is greatly appreciated. Thanks!!
From your stack trace it looks like a directory is creeping in your listdF
and pandas.read_csv()
fails trying to load that. Try explicitly filtering out directories: listDf = [x for x in os.listdir(path) if os.path.isfile(os.path.join(path, x)) and x.endswith('.csv')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.