Pandas - 試圖將多個 .txt 文件存儲在 a.csv 中

Question

我有一個包含大約 500.txt 文件的文件夾。 我想將內容存儲在 csv 文件中，有 2 列，第 1 列是文件名，第 2 列是字符串中的文件內容。 所以我最終會得到一個包含 501 行的 CSV 文件。

我已經窺探了 SO 並試圖找到類似的問題，並提出了以下代碼：

import pandas as pd
from pandas.io.common import EmptyDataError
import os


def Aggregate_txt_csv(path):
    for files in os.listdir(path):
            with open(files, 'r') as file:
                try: 
                    df = pd.read_csv(file, header=None, delim_whitespace=True)
                except EmptyDataError:
                    df = pd.DataFrame()
                
            return df.to_csv('file.csv', index=False)

但是它返回一個空的.csv 文件。 難道我做錯了什么？

Answer 1

您的代碼有幾個問題。 其中之一是 pd.read_csv 沒有打開file ，因為您沒有將路徑傳遞給給定文件。 我認為您應該嘗試使用此代碼進行播放

import os
import pandas as pd
from pandas.io.common import EmptyDataError

def Aggregate_txt_csv(path):
    files = os.listdir(path)
    df = []
    for file in files:
        try: 
            d = pd.read_csv(os.path.join(path, file), header=None, delim_whitespace=True)
            d["file"] = file
        except EmptyDataError:
            d = pd.DataFrame({"file":[file]})
        df.append(d)
    df = pd.concat(df, ignore_index=True)
    df.to_csv('file.csv', index=False)

Answer 2

使用路徑庫
- Path.glob()查找所有文件
- 使用路徑對象時， file.stem從路徑中返回文件名。
使用pandas.concat組合df_list中的數據幀

from pathlib import Path
import pandas as pd

p = Path('e:/PythonProjects/stack_overflow')  # path to files
files = p.glob('*.txt')  # get all txt files

df_list = list()  # create an empty list for the dataframes
for file in files:  # iterate through each file
    with file.open('r') as f:
        text = '\n'.join([line.strip() for line in f.readlines()])  # join all rows in list as a single string separated with \n
        
    df_list.append(pd.DataFrame({'filename': [file.stem], 'contents': [text]}))  # create and append a dataframe


df_all = pd.concat(df_list)  # concat all the dataframes

df_all.to_csv('files.txt', index=False)  # save to csv

Answer 3

我注意到已經有一個答案，但我已經讓它與一段相對簡單的代碼一起工作。 我只是稍微編輯了讀入的文件，dataframe 輸出成功。

鏈接在這里

import pandas as pd
from pandas.io.common import EmptyDataError
import os


def Aggregate_txt_csv(path):
    result = []
    print(os.listdir(path))
    for files in os.listdir(path):
        fullpath = os.path.join(path, files)
        if not os.path.isfile(fullpath):
            continue

        with open(fullpath, 'r', errors='replace') as file:
            try:
                content = '\n'.join(file.readlines())
                result.append({'title': files, 'body': content})
            except EmptyDataError:
                result.append({'title': files, 'body': None})
            
    df = pd.DataFrame(result)
    return df

df = Aggregate_txt_csv('files')
print(df)
df.to_csv('result.csv')

最重要的是，我將附加到一個數組，以免運行 pandas 的串聯 function 太多，因為這對性能非常不利。 此外，讀取文件不需要 read_csv，因為文件沒有固定的格式。 因此，使用'\n'.join(file.readlines())可以讓您清楚地讀取文件並將所有行取出到一個字符串中。

最后，我將字典數組轉換為最終的 dataframe，並返回結果。

編輯：對於不是當前目錄的路徑，我將其更新為 append 路徑，以便它可以找到必要的文件，為混淆道歉

Pandas - 試圖將多個 .txt 文件存儲在 a.csv 中

問題描述

3 個解決方案

解決方案1
1 2020-06-25 20:13:08

解決方案2
1 2020-06-25 20:18:03

解決方案3
0 已采納 2020-06-25 20:18:44

Pandas - 試圖將多個 .txt 文件存儲在 a.csv 中

問題描述

3 個解決方案

解決方案1 1 2020-06-25 20:13:08

解決方案2 1 2020-06-25 20:18:03

解決方案3 0 已采納 2020-06-25 20:18:44

解決方案1
1 2020-06-25 20:13:08

解決方案2
1 2020-06-25 20:18:03

解決方案3
0 已采納 2020-06-25 20:18:44