![](/img/trans.png)
[英]python script to concatenate all the files in the directory into one file
[英]How to concatenate all files in the directory into one file
我正在關注一個在線教程(從 2 年前開始,出於某種原因,我上次問問題很重要,因為顯然最新版本的 Python 的語法已經更改)。 無論如何,這是我正在使用的代碼:
files = [file for file in os.listdir ('./Sales_Data')]
all_months_data=pd.DataFrame()
for file in files:
df= pd.read_csv("./Sales_Data"+file)
all_months_data= pd.concat ([all_months_data, df])
all_months_data.to_csv("all_data.csv",index= False)
這是我得到的錯誤:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13984/1832777700.py in <module>
2 all_months_data=pd.DataFrame()
3 for file in files:
----> 4 df= pd.read_csv("./Sales_Data"+file)
5 all_months_data= pd.concat ([all_months_data, df])
6 all_months_data.to_csv("all_data.csv",index= False)
~\anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
584 kwds.update(kwds_defaults)
585
--> 586 return _read(filepath_or_buffer, kwds)
587
588
~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _read(filepath_or_buffer, kwds)
480
481 # Create the parser.
--> 482 parser = TextFileReader(filepath_or_buffer, **kwds)
483
484 if chunksize or iterator:
~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py in __init__(self, f, engine, **kwds)
809 self.options["has_index_names"] = kwds["has_index_names"]
810
--> 811 self._engine = self._make_engine(self.engine)
812
813 def close(self):
~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _make_engine(self, engine)
1038 )
1039 # error: Too many arguments for "ParserBase"
-> 1040 return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
1041
1042 def _failover_to_python(self):
~\anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py in __init__(self, src, **kwds)
49
50 # open handles
---> 51 self._open_handles(src, kwds)
52 assert self.handles is not None
53
~\anaconda3\lib\site-packages\pandas\io\parsers\base_parser.py in _open_handles(self, src, kwds)
220 Let the readers open IOHandles after they are done with their potential raises.
221 """
--> 222 self.handles = get_handle(
223 src,
224 "r",
~\anaconda3\lib\site-packages\pandas\io\common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
700 if ioargs.encoding and "b" not in ioargs.mode:
701 # Encoding
--> 702 handle = open(
703 handle,
704 ioargs.mode,
FileNotFoundError: [Errno 2] No such file or directory: './Sales_DataSales_April_2019.csv'
1
2
我嘗試檢查/更改拼寫/語法錯誤,它有助於解決其中一個錯誤。
我還嘗試添加最后一行代碼,我認為它只會添加更多錯誤。
如果您繼續閱讀堆棧跟蹤,答案是顯而易見的:
FileNotFoundError: [Errno 2] No such file or directory: './Sales_DataSales_April_2019.csv'
在for
循環中構建文件路徑時缺少正斜杠 ( /
)。
所以,要修復它:
files = [file for file in os.listdir ('./Sales_Data')]
all_months_data=pd.DataFrame()
for file in files:
df= pd.read_csv("./Sales_Data/" + file) # This line!
all_months_data= pd.concat ([all_months_data, df])
all_months_data.to_csv("all_data.csv",index= False)
更簡潔的解決方案是使用方法os.path.join
,請參閱其他 StackOverflow 答案: Create file path from variables
如果我們使用os.path.join
files = [file for file in os.listdir ('./Sales_Data')]
all_months_data=pd.DataFrame()
for file in files:
df= pd.read_csv(os.path.join("./Sales_Data", file)) # This line!
all_months_data= pd.concat ([all_months_data, df])
all_months_data.to_csv("all_data.csv",index= False)
我的最后一個提示是:在提問之前一定要閱讀並理解錯誤信息!
嗨希望你做得很好!
您可以使用pathlib
而不是os
,它也是 stdlib 的一部分,因此無需安裝任何東西。 在這種情況下,您的代碼將如下所示:
import pathlib
import pandas as pd
dirpath = pathlib.Path("sales_data")
# sales_data
# ├── sales_2019_01.csv
# └── sales_2019_02.csv
files = list(dirpath.glob("**/*.csv"))
# [PosixPath('sales_data/sales_2019_01.csv'), PosixPath('sales_data/sales_2019_02.csv')]
all_months_data = []
for file in files:
df = pd.read_csv(file)
all_months_data.append(df)
all_months_data_df = pd.concat(all_months_data).reset_index(drop=True)
all_months_data_df.to_csv("all_data.csv", index=False)
但如果您仍然對您的代碼有什么問題感興趣:您丟失了/
何時正在閱讀 csv: df= pd.read_csv(f"./Sales_Data/{file}")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.