如何將目錄中的所有文件拼接成一個文件

Question

我正在關注一個在線教程（從 2 年前開始，出於某種原因，我上次問問題很重要，因為顯然最新版本的 Python 的語法已經更改）。 無論如何，這是我正在使用的代碼：

files = [file for file in os.listdir ('./Sales_Data')]
all_months_data=pd.DataFrame()
for file in files:
    df= pd.read_csv("./Sales_Data"+file)
    all_months_data= pd.concat ([all_months_data, df])
all_months_data.to_csv("all_data.csv",index= False)

這是我得到的錯誤：

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13984/1832777700.py in <module>
      2 all_months_data=pd.DataFrame()
      3 for file in files:
----> 4     df= pd.read_csv("./Sales_Data"+file)
      5     all_months_data= pd.concat ([all_months_data, df])
      6 all_months_data.to_csv("all_data.csv",index= False)

~\anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    584     kwds.update(kwds_defaults)
    585 
--> 586     return _read(filepath_or_buffer, kwds)
    587 
    588 

~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _read(filepath_or_buffer, kwds)
    480 
    481     # Create the parser.
--> 482     parser = TextFileReader(filepath_or_buffer, **kwds)
    483 
    484     if chunksize or iterator:

~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py in __init__(self, f, engine, **kwds)
    809             self.options["has_index_names"] = kwds["has_index_names"]
    810 
--> 811         self._engine = self._make_engine(self.engine)
    812 
    813     def close(self):

~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _make_engine(self, engine)
   1038             )
   1039         # error: Too many arguments for "ParserBase"
-> 1040         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   1041 
   1042     def _failover_to_python(self):

~\anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py in __init__(self, src, **kwds)
     49 
     50         # open handles
---> 51         self._open_handles(src, kwds)
     52         assert self.handles is not None
     53 

~\anaconda3\lib\site-packages\pandas\io\parsers\base_parser.py in _open_handles(self, src, kwds)
    220         Let the readers open IOHandles after they are done with their potential raises.
    221         """
--> 222         self.handles = get_handle(
    223             src,
    224             "r",

~\anaconda3\lib\site-packages\pandas\io\common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    700         if ioargs.encoding and "b" not in ioargs.mode:
    701             # Encoding
--> 702             handle = open(
    703                 handle,
    704                 ioargs.mode,

FileNotFoundError: [Errno 2] No such file or directory: './Sales_DataSales_April_2019.csv'

1



2

我嘗試檢查/更改拼寫/語法錯誤，它有助於解決其中一個錯誤。

我還嘗試添加最后一行代碼，我認為它只會添加更多錯誤。

Answer 1

如果您繼續閱讀堆棧跟蹤，答案是顯而易見的：

FileNotFoundError: [Errno 2] No such file or directory: './Sales_DataSales_April_2019.csv'

在for循環中構建文件路徑時缺少正斜杠 ( / )。

所以，要修復它：

    files = [file for file in os.listdir ('./Sales_Data')]
    all_months_data=pd.DataFrame()
    for file in files:
        df= pd.read_csv("./Sales_Data/" + file) # This line!
        all_months_data= pd.concat ([all_months_data, df])
    all_months_data.to_csv("all_data.csv",index= False)

更簡潔的解決方案是使用方法os.path.join ，請參閱其他 StackOverflow 答案： Create file path from variables

如果我們使用os.path.join

    files = [file for file in os.listdir ('./Sales_Data')]
    all_months_data=pd.DataFrame()
    for file in files:
        df= pd.read_csv(os.path.join("./Sales_Data", file)) # This line!
        all_months_data= pd.concat ([all_months_data, df])
    all_months_data.to_csv("all_data.csv",index= False)

我的最后一個提示是：在提問之前一定要閱讀並理解錯誤信息！

Answer 2

嗨希望你做得很好！

您可以使用pathlib而不是os ，它也是 stdlib 的一部分，因此無需安裝任何東西。 在這種情況下，您的代碼將如下所示：

import pathlib

import pandas as pd

dirpath = pathlib.Path("sales_data")
# sales_data
# ├── sales_2019_01.csv
# └── sales_2019_02.csv

files = list(dirpath.glob("**/*.csv"))
# [PosixPath('sales_data/sales_2019_01.csv'), PosixPath('sales_data/sales_2019_02.csv')]

all_months_data = []

for file in files:
    df = pd.read_csv(file)
    all_months_data.append(df)

all_months_data_df = pd.concat(all_months_data).reset_index(drop=True)
all_months_data_df.to_csv("all_data.csv", index=False)

但如果您仍然對您的代碼有什么問題感興趣：您丟失了/何時正在閱讀 csv： df= pd.read_csv(f"./Sales_Data/{file}")

如何將目錄中的所有文件拼接成一個文件

問題描述

2 個解決方案

解決方案1
2 2022-12-11 13:34:13

解決方案2
0 2022-12-11 13:33:42

如何將目錄中的所有文件拼接成一個文件

問題描述

2 個解決方案

解決方案1 2 2022-12-11 13:34:13

解決方案2 0 2022-12-11 13:33:42

解決方案1
2 2022-12-11 13:34:13

解決方案2
0 2022-12-11 13:33:42