簡體   English   中英

Python-遞歸遍歷所有文本文件

[英]Python - Iterating over all text files recursively

我正在使用python 3.6創建文本解析器。 我的文件布局如下:

(我將使用的實際文件結構要比這要廣泛得多。)

-Directory(main folder)
    -amerigroup.txt
    -bcbs.txt
    childfolder
         -medicare.txt

我需要將文本提取到2個不同的列表中(遍歷並追加到不斷增長的列表中)。 每當我運行當前代碼時,似乎都無法讓我的程序打開medicare.txt文件來讀取和提取信息。 我收到一條錯誤消息,指出沒有這樣的文件或目錄:'medicare.txt'。

我的目標是從3個文件中獲取數據並一次性提取。 如何獲取amerigroup和bcbs數據,然后進入childfolder並獲取medicare.txt,然后對文件路徑的所有分支重復此操作?

我只是試圖在此代碼片段中打開和關閉我的文本文件。 這是我到目前為止的內容:

import re
import os
import pandas as pd

#change active directory
os.chdir(r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
#rootdir = r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest'

#set up Regular Expression objects to parse X12
claimidRegex = re.compile(r'(CLM\*)(\d+)')
dxRegex = re.compile(r'(ABK:)(\w\d+)(\*|~)(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?')

claimids = []
dxinfo = []

for dirpath, dirnames, files in os.walk(topdir):
    for name in files:
        cid = []
        dx = []
        if name.lower().endswith(exten):
            data = open(name, 'r')
            data.close()

非常感謝您抽出寶貴時間為我提供幫助!

編輯:到目前為止,我已經嘗試使用步行無濟於事。 我最近的嘗試(我也嘗試使用txtfile_full_path -不起作用):

for dirpath, dirnames, filename in os.walk(base_dir):
    for filename in filename:
        #defining file type
        txtfile=open(filename,"r")
        txtfile_full_path = os.path.join(dirpath, filename)
        print(filename)

對任何有興趣的人。 這是我對該問題的最終解決方案:

import re
import os
import pandas as pd


#change active directory
os.chdir(r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
base_dir = (r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')

#set up Regular Expression objects to parse X12
claimidRegex = re.compile(r'(CLM\*)(\d+)')
dxRegex = re.compile(r'(ABK:)(\w\d+)(\*|~)(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?')

claimids = []
dxinfo = []

for dirpath, dirnames, filename in os.walk(base_dir):
    for filename in filename:
        txtfile_full_path = os.path.join(dirpath, filename)
        x12 = open(txtfile_full_path, 'r')
        for i in x12:
            match = claimidRegex.findall(i)
            for word in match:
                claimids.append(word[1])
        x12.seek(0)
        for i in x12:
            match = dxRegex.findall(i)
            for word in match:
                dxinfo.append(word)
        x12.close()

datadic = dict(zip(claimids, dxinfo))

您需要通過完整的路徑才能open 僅在某處創建字符串變量對您無濟於事! 因此,以下應避免您的錯誤:

txt_list = []
for dirpath, dirnames, filename in os.walk(base_dir):
    for filename in filename:
        # create full path
        txtfile_full_path = os.path.join(dirpath, filename)
        with open(txtfile_full_path) as f:
            txt_list.append(f.read())

現在,基於您的正則表達式集成隔離應該足夠容易了...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM