简体   繁体   English

如何使用python regexp提取图像名称?

[英]How to extract image name using python regexp?

so i want to extract the images name from the path , to use them as labels further in a classification Task. 所以我想从路径中提取图像名称,以便在分类任务中进一步将它们用作标签。 this is the pasth: 这是面食:

[PosixPath('/content/drive/My Drive/Logo/adidas10.jpg'), [PosixPath('/ content / drive / My Drive / Logo / adidas10.jpg'),

PosixPath('/content/drive/My Drive/Logo/adidas11.jpg'), PosixPath('/ content / drive / My Drive / Logo / adidas11.jpg'),

PosixPath('/content/drive/My Drive/Logo/adidas13.jpg'), PosixPath('/ content / drive / My Drive / Logo / adidas13.jpg'),

. . .] and it goes on for 600 images . 。]并继续显示600张图像。

so what i want to get is the brand name , like in this case adidas. 所以我想得到的是品牌名称,例如adidas。

here's the RegEx expression i used : 这是我使用的RegEx表达式:

r’([\w\s.-]).[jpg]’

but , what i get after checking the images label is this: 但是,检查图像标签后得到的是:

 print(data.classes)

 ['L']

any Suggestions ? 有什么建议么 ? , thanks. , 谢谢。

([\\w\\s.-]) (any of [A-Za-z0-9_] , whitespaces, literal . , literal - ) is capturing the L in Logo because it follows: ([\\w\\s.-])[A-Za-z0-9_]任何空格,文字. ,文字- )正在捕获LogoL ,因为它遵循以下条件:

  • . : any single character; :任何单个字符; o in this case o在这种情况下,
  • [jpg] : any of j , p , g ; [jpg]jpg任何一个; g here g这里

You need: 你需要:

/([^/]+)\.jpg$

Now the only captured group will contain the image name. 现在,唯一捕获的组将包含图像名称。

  • / matches literal / /匹配文字/
  • ([^/]+) matches one or more characters that are not / -- the file name ([^/]+)匹配一个或多个非/字符-文件名
  • \\.jpg matches .jpg at the end ( $ ) \\.jpg匹配末尾的.jpg$

A great resource to try your regex out is Regex101 . Regex101是一个很好的尝试使用正则表达式的资源。

You try to group the file ending together using square braces [] which creates a selection of either of the three characters j , p or g . 您尝试使用方括号[]将文件结尾在一起,以创建三个字符jpg的任意一个的选择。 The dot is not escaped and thus - in regex syntax - is any character. 该点不会转义,因此-在正则表达式语法中-是任何字符。 Since you never add any multipliers (like + for 1 or more characters, * for 0 or more characters or ? for an optional character), you only match a few letters in total. 由于您从不添加任何乘数(例如+表示1个或多个字符, *表示0个或多个字符,或?表示可选字符),因此总共只匹配几个字母。

If you want to read a bit more about all regex operators, modifiers and similar concepts, I recommend reading the documentation of python's re module . 如果您想更多地了解所有正则表达式运算符,修饰符和类似概念, 建议阅读python的re模块文档

You can either rewrite the regular expression to something like this (extract the first group in order to retrieve the filename) or use the fact that there is a path processing library in the os package: 您可以将正则表达式重写为如下形式(提取第一组以检索文件名),或者使用os包中存在路径处理库这一事实:

^.*\/([^\/]+\.jpg).*$

You can see this regex in action here. 您可以在此处查看此正则表达式的实际操作。

Since you seem to already have path objects available, you could however just extract the basename of the path, which in your case will be the filename: 由于您似乎已经有了可用的路径对象,因此您可以提取路径的基本名称 ,在您的情况下,将使用文件名:

from os.path import basename
a = '/content/drive/My Drive/Logo/adidas10.jpg'
filename = basename(a)

filename would now be adidas10.jpg filename现在为adidas10.jpg

You make use of a character class which will one of the listed characters. 您使用一个字符类 ,它将列出其中一个字符。 So your pattern ([\\w\\s.-]).[jpg] will capture in a group matching one of [\\w\\s.-] , then match match any char except a newline due to the dot . 因此,您的模式([\\w\\s.-]).[jpg]将捕获到与[\\w\\s.-]之一匹配的组中,然后匹配匹配除点换行符以外的任何字符. and then match one of [jpg] . 然后匹配[jpg]

For your example data, that will give you a capturing group for L and a match for og as well as a capturing group for 0 , 1 , 3 and a match for .j 为了您的数据。例如,这会给你一个捕获组L和匹配og以及捕获组为013和匹配.j

If you want to get the brand name like adidas from your examples as you state in your question, you could use a capturing group. 如果您想在问题中陈述自己的例子,从而获得adidas之类的品牌名称,则可以使用一个捕获组。

/([^/]*[^/\d])\d*\.jpg

Regex demo | 正则表达式演示 | Python demo Python演示

That will match 那将匹配

  • / Match literally /从字面上匹配
  • ( Capturing group (This will contain the brand name) (捕获组(将包含品牌名称)
    • [^/]* Match not a / 0+ times not a / using a negated character classes [^/]*匹配不是/ 0+倍不是/使用否定字符类
    • [^/\\d] Match not a / or a digit [^/\\d]不匹配/或数字
  • ) Close group )封闭小组
  • \\d* Match 0+ times a digit \\d*匹配数字0+次
  • \\.jpg Match .jpg \\.jpg匹配.jpg

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM