簡體   English   中英

在python中對許多文件進行排序

[英]Issue sorting lots of files in python

我的目錄包含超過10 000個文件,所有文件都具有相同的擴展名。 全部具有相同的形式,例如,

 20150921(1)_0001.sgy
 20150921(1)_0002.sgy
 20150921(1)_0003.sgy
 20150921(1)_0004.sgy
...
20150921(1)_13290.sgy

我當前使用的代碼是:

files = listdir('full data')
files.sort()

但是,這將返回以下列表:

20150921(1)_0001.sgy
...
20150921(1)_0998.sgy
20150921(1)_0999.sgy
20150921(1)_1000.sgy
20150921(1)_10000.sgy
20150921(1)_10001.sgy
20150921(1)_10002.sgy
20150921(1)_10003.sgy
20150921(1)_10004.sgy
20150921(1)_10005.sgy
20150921(1)_10006.sgy
20150921(1)_10007.sgy
20150921(1)_10008.sgy
20150921(1)_10009.sgy
20150921(1)_1001.sgy
20150921(1)_10010.sgy

僅當文件數超過1000時才會出現問題,如果文件數大於10000,似乎無法正確排序文件。有人可以解決此問題嗎?

這稱為自然排序 您可以使用natsort軟件包來執行此操作:

from natsort import natsorted
import pprint

files = ['20150921(1)_0001.sgy',
'20150921(1)_0102.sgy',
'20150921(1)_0011.sgy',
'20150921(1)_0003.sgy',
'20150921(1)_0004.sgy',
'20150921(1)_0010.sgy',
'20150921(1)_1001.sgy',
'20150921(1)_0012.sgy',
'20150921(1)_0101.sgy',
'20150921(1)_1003.sgy',
'20150921(1)_0103.sgy',
'20150921(1)_10002.sgy',
'20150921(1)_1002.sgy',
'20150921(1)_10001.sgy',
'20150921(1)_0002.sgy',
]

pprint.pprint(natsorted(files))

輸出:

['20150921(1)_0001.sgy',
 '20150921(1)_0002.sgy',
 '20150921(1)_0003.sgy',
 '20150921(1)_0004.sgy',
 '20150921(1)_0010.sgy',
 '20150921(1)_0011.sgy',
 '20150921(1)_0012.sgy',
 '20150921(1)_0101.sgy',
 '20150921(1)_0102.sgy',
 '20150921(1)_0103.sgy',
 '20150921(1)_1001.sgy',
 '20150921(1)_1002.sgy',
 '20150921(1)_1003.sgy',
 '20150921(1)_10001.sgy',
 '20150921(1)_10002.sgy']
sorted_filenames = sorted(os.listdir('full data'), key=lambda s: int(s.rsplit('.',1)[0].split("_",1)[1]))

他們按字母順序排序。 如果要按數字對它們進行排序,則需要先進行一些解析:

   def filename_to_tuple(name):
      import re
      match = re.match(r'(\d+)\((\d+)\)_(\d+)\.sgy', name)
      if not match:
         raise ValueError('Filename doesn't match expected pattern')
      else:
         return int(i for i in match.groups())

   sorted_files = sorted(os.listdir('full data'), key=filename_to_tuple)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM