[英]Python - Iterating over all text files recursively
我正在使用python 3.6創建文本解析器。 我的文件布局如下:
(我將使用的實際文件結構要比這要廣泛得多。)
-Directory(main folder)
-amerigroup.txt
-bcbs.txt
childfolder
-medicare.txt
我需要將文本提取到2個不同的列表中(遍歷並追加到不斷增長的列表中)。 每當我運行當前代碼時,似乎都無法讓我的程序打開medicare.txt文件來讀取和提取信息。 我收到一條錯誤消息,指出沒有這樣的文件或目錄:'medicare.txt'。
我的目標是從3個文件中獲取數據並一次性提取。 如何獲取amerigroup和bcbs數據,然后進入childfolder並獲取medicare.txt,然后對文件路徑的所有分支重復此操作?
我只是試圖在此代碼片段中打開和關閉我的文本文件。 這是我到目前為止的內容:
import re
import os
import pandas as pd
#change active directory
os.chdir(r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
#rootdir = r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest'
#set up Regular Expression objects to parse X12
claimidRegex = re.compile(r'(CLM\*)(\d+)')
dxRegex = re.compile(r'(ABK:)(\w\d+)(\*|~)(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?')
claimids = []
dxinfo = []
for dirpath, dirnames, files in os.walk(topdir):
for name in files:
cid = []
dx = []
if name.lower().endswith(exten):
data = open(name, 'r')
data.close()
非常感謝您抽出寶貴時間為我提供幫助!
編輯:到目前為止,我已經嘗試使用步行無濟於事。 我最近的嘗試(我也嘗試使用txtfile_full_path -不起作用):
for dirpath, dirnames, filename in os.walk(base_dir):
for filename in filename:
#defining file type
txtfile=open(filename,"r")
txtfile_full_path = os.path.join(dirpath, filename)
print(filename)
對任何有興趣的人。 這是我對該問題的最終解決方案:
import re
import os
import pandas as pd
#change active directory
os.chdir(r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
base_dir = (r'\\Centerstone.lan\Files\HomeDrive\angus.gray\My Documents\claimstest')
#set up Regular Expression objects to parse X12
claimidRegex = re.compile(r'(CLM\*)(\d+)')
dxRegex = re.compile(r'(ABK:)(\w\d+)(\*|~)(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?(ABF:)?(\w\d+)?(\*|~)?')
claimids = []
dxinfo = []
for dirpath, dirnames, filename in os.walk(base_dir):
for filename in filename:
txtfile_full_path = os.path.join(dirpath, filename)
x12 = open(txtfile_full_path, 'r')
for i in x12:
match = claimidRegex.findall(i)
for word in match:
claimids.append(word[1])
x12.seek(0)
for i in x12:
match = dxRegex.findall(i)
for word in match:
dxinfo.append(word)
x12.close()
datadic = dict(zip(claimids, dxinfo))
您需要通過完整的路徑才能open
。 僅在某處創建字符串變量對您無濟於事! 因此,以下應避免您的錯誤:
txt_list = []
for dirpath, dirnames, filename in os.walk(base_dir):
for filename in filename:
# create full path
txtfile_full_path = os.path.join(dirpath, filename)
with open(txtfile_full_path) as f:
txt_list.append(f.read())
現在,基於您的正則表達式集成隔離應該足夠容易了...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.