[英]How to extract numbers from filename in Python?
我只需要從文件名中提取數字,例如:
間隙點1.shp
GapPoints23.shp
GapPoints109.shp
如何使用 Python 從這些文件中僅提取數字? 我需要將它合並到一個for
循環中。
您可以使用正則表達式:
regex = re.compile(r'\d+')
然后獲取匹配的字符串:
regex.findall(filename)
這將返回一個包含數字的字符串列表。 如果你真的想要整數,你可以使用int
:
[int(x) for x in regex.findall(filename)]
如果每個文件名中只有 1 個數字,您可以使用regex.search(filename).group(0)
(如果您確定它會產生匹配)。 如果找不到匹配項,上面的行將產生一個 AttributeError 說明NoneType
沒有屬性group
。
所以,您沒有留下任何關於這些文件在哪里以及如何獲取它們的描述,但我假設您將使用os
模塊獲取文件名。
至於從名稱中獲取數字,最好使用帶有re
正則表達式,如下所示:
import re
def get_numbers_from_filename(filename):
return re.search(r'\d+', filename).group(0)
然后,要將其包含在 for 循環中,您需要在每個文件名上運行該函數:
for filename in os.listdir(myfiledirectory):
print get_numbers_from_filename(filename)
或類似的規定。
如果只有一個數字:
filter(lambda x: x.isdigit(), filename)
聽說是我的代碼,我用來將論文的發表年份放在文件名的第一個,從谷歌學者下載文件后。 主要文件通常是這樣構建的:Author+publishedYear.pdf 因此,通過實現這個代碼,文件名將變成:PublishedYear+Author.pdf。
# Renaming Pdf according to number extraction
# You want to rename a pdf file, so the digits of document published year comes first.
# Use regular expersion
# As long as you implement this file, the other pattern will be accomplished to your filename.
# import libraries
import re
import os
# Change working directory to this folder
address = os.getcwd ()
os.chdir(address)
# defining a class with two function
class file_name:
# Define a function to extract any digits
def __init__ (self, filename):
self.filename = filename
# Because we have tow pattern, we must define tow function.
# First function for pattern as : schrodinger1990.pdf
def number_extrction_pattern_non_digits_first (filename):
pattern = (r'(\D+)(\d+)(\.pdf)')
digits_pattern_non_digits_first = re.search(pattern, filename, re.IGNORECASE).group (2)
non_digits_pattern_non_digits_first = re.search(pattern, filename, re.IGNORECASE).group (1)
return digits_pattern_non_digits_first, non_digits_pattern_non_digits_first
# Second function for pattern as : 1993schrodinger.pdf
def number_extrction_pattern_digits_first (filename):
pattern = (r'(\d+)(\D+)(\.pdf)')
digits_pattern_digits_first = re.search(pattern, filename, re.IGNORECASE).group (1)
non_digits_pattern_digits_first = re.search(pattern, filename, re.IGNORECASE).group (2)
return digits_pattern_digits_first, non_digits_pattern_digits_first
if __name__ == '__main__':
# Define a pattern to check filename pattern
pattern_check1 = (r'(\D+)(\d+)(\.pdf)')
# Declare each file address.
for filename in os.listdir(address):
if filename.endswith('.pdf'):
if re.search(pattern_check1, filename, re.IGNORECASE):
digits = file_name.number_extrction_pattern_non_digits_first (filename)[0]
non_digits = file_name.number_extrction_pattern_non_digits_first (filename)[1]
os.rename(filename, digits + non_digits + '.pdf')
# Else other pattern exists.
else :
digits = file_name.number_extrction_pattern_digits_first (filename)[0]
non_digits = file_name.number_extrction_pattern_digits_first (filename)[1]
os.rename(filename, digits + non_digits + '.pdf')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.