Python-遞歸遍歷所有文本文件

Question

我正在使用python 3.6創建文本解析器。 我的文件布局如下：

（我將使用的實際文件結構要比這要廣泛得多。）

-Directory(main folder)
    -amerigroup.txt
    -bcbs.txt
    childfolder
         -medicare.txt

我需要將文本提取到2個不同的列表中（遍歷並追加到不斷增長的列表中）。 每當我運行當前代碼時，似乎都無法讓我的程序打開medicare.txt文件來讀取和提取信息。 我收到一條錯誤消息，指出沒有這樣的文件或目錄：'medicare.txt'。

我的目標是從3個文件中獲取數據並一次性提取。 如何獲取amerigroup和bcbs數據，然后進入childfolder並獲取medicare.txt，然后對文件路徑的所有分支重復此操作？

我只是試圖在此代碼片段中打開和關閉我的文本文件。 這是我到目前為止的內容：

import re
import os
import pandas as pd

#change active directory
os.chdir(r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
#rootdir = r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest'

#set up Regular Expression objects to parse X12
claimidRegex = re.compile(r'(CLM\*)(\d+)')
dxRegex = re.compile(r'(ABK:)(\w\d+)(\*|~)(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?')

claimids = []
dxinfo = []

for dirpath, dirnames, files in os.walk(topdir):
    for name in files:
        cid = []
        dx = []
        if name.lower().endswith(exten):
            data = open(name, 'r')
            data.close()

非常感謝您抽出寶貴時間為我提供幫助！

編輯：到目前為止，我已經嘗試使用步行無濟於事。 我最近的嘗試（我也嘗試使用txtfile_full_path －不起作用）：

for dirpath, dirnames, filename in os.walk(base_dir):
    for filename in filename:
        #defining file type
        txtfile=open(filename,"r")
        txtfile_full_path = os.path.join(dirpath, filename)
        print(filename)

對任何有興趣的人。 這是我對該問題的最終解決方案：

import re
import os
import pandas as pd


#change active directory
os.chdir(r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
base_dir = (r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')

#set up Regular Expression objects to parse X12
claimidRegex = re.compile(r'(CLM\*)(\d+)')
dxRegex = re.compile(r'(ABK:)(\w\d+)(\*|~)(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?')

claimids = []
dxinfo = []

for dirpath, dirnames, filename in os.walk(base_dir):
    for filename in filename:
        txtfile_full_path = os.path.join(dirpath, filename)
        x12 = open(txtfile_full_path, 'r')
        for i in x12:
            match = claimidRegex.findall(i)
            for word in match:
                claimids.append(word[1])
        x12.seek(0)
        for i in x12:
            match = dxRegex.findall(i)
            for word in match:
                dxinfo.append(word)
        x12.close()

datadic = dict(zip(claimids, dxinfo))

Answer 1

您需要通過完整的路徑才能open 。 僅在某處創建字符串變量對您無濟於事！ 因此，以下應避免您的錯誤：

txt_list = []
for dirpath, dirnames, filename in os.walk(base_dir):
    for filename in filename:
        # create full path
        txtfile_full_path = os.path.join(dirpath, filename)
        with open(txtfile_full_path) as f:
            txt_list.append(f.read())

現在，基於您的正則表達式集成隔離應該足夠容易了...

Python-遞歸遍歷所有文本文件

問題描述

1 個解決方案

解決方案1
0 已采納 2017-09-12 19:17:20

Python-遞歸遍歷所有文本文件

問題描述

1 個解決方案

解決方案1 0 已采納 2017-09-12 19:17:20

解決方案1
0 已采納 2017-09-12 19:17:20