簡體   English   中英

將目錄中的文件名匹配到Pandas系列,刪除不匹配的文件

[英]Match file names in a directory to Pandas series, delete non matching files

我使用的是Python 2.7。

我在目錄中有一堆文件(主要是Outlook電子郵件)。 示例文件名:

RE: We have Apple.msg
RE: Orange are in stock.msg
RE: Pick up some cabbage please.msg

我有一個熊貓系列

Granny Smith Apple
High Quality Orange
Delicious soup

如何遍歷目錄,查找包含pandas系列單詞的文件名,並刪除找不到匹配項的文件? 在上面的示例中, RE: Pick up some cabbage please.msg 。由於在熊貓系列中發現了AppleOrange ,因此RE: Pick up some cabbage please.msg將被刪除。

編輯:我想實際刪除目錄中找不到匹配的文件

我們可以使用str.contains

s1[pd.Series(l).str.contains('|'.join(s.str.split().sum()))]
Out[560]: 
0          RE: We have Apple.msg
1    RE: Orange are in stock.msg
dtype: object

數據輸入


l=['RE: We have Apple.msg',
'RE: Orange are in stock.msg',
'RE: Pick up some cabbage please.msg']
s1=pd.Series(l)
s=pd.Series(['Granny Smith Apple','High Quality Orange','Delicious soup'])

可以使用oslistdir ,然后使用str.contains

from os import listdir
from os.path import isfile, join
m = '/' # your path
files_in_directory = [f for f in listdir(m) if isfile(join(m, f))]
files = pd.Series(files_in_directory)

s = pd.Series(["Granny Smith Apple",
"High Quality Orange",
"Delicious soup"])

z = pd.Series(s.str.split().sum())
files.str.contains('|'.join(z))

這是我發現適合我的解決方案

#contains strings we want to filter
checklist = [x.lower() for x in checklist]

m = r''  # path where our files are contained
new_directory = r'' # path where we will move the matched files to to


for each_checklist in checklist:
    print 'now checking for keyword ' + str(each_checklist)
    for root, dirs, files in os.walk(m):
        for i in files:
            if each_checklist in i.lower():
                # this moves the file from root, to target directory
                os.rename(os.path.join(root, i), os.path.join(new_directory, i))
            else:
                None

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM