简体   繁体   English

从文件路径中提取所需的目录名称

[英]extract required directory name from a file path

I have the path-name of a file as follows:我有一个文件的路径名,如下所示:

fi = "http://pen.jamstec.go.jp/TKC_/public_html/original/dc/dc_2008/dc_2008_141/dc_2008_141_0706+0900_TKC__y30_u.jpg"

How can I extract "TKC_" as the required information?如何提取"TKC_"作为所需信息?

I tried as print (os.path.basename(fi)) but it printed:我尝试print (os.path.basename(fi))但它打印:

dc_2008_141_0706+0900_TKC__y30_u.jpg

"http://pen.jamstec.go.jp/**TKC_**/public_html/original/dc/dc_2008/dc_2008_141/dc_2008_141_0706+0900_TKC__y30_u.jpg"

Try尝试

from urllib.parse import urlparse

fi = "http://pen.jamstec.go.jp/TKC_/public_html/original/dc/dc_2008/dc_2008_141/dc_2008_141_0706+0900_TKC__y30_u.jpg"
urlparse(fi).path.split('/')[1]

Try using regex:尝试使用正则表达式:

import re
fi = "http://pen.jamstec.go.jp/TKC_/public_html/original/dc/dc_2008/dc_2008_141/dc_2008_141_0706+0900_TKC__y30_u.jpg"
result = re.search(r'http://.*?/(.*?)/',fi)
print(result.group(1))

You can use Regex to define a pattern to search for in a string, which is TKC_ in this case.您可以使用Regex定义要在字符串中搜索的模式,在本例中为TKC_

import re
fi = "http://pen.jamstec.go.jp/TKC_/public_html/original/dc/dc_2008/dc_2008_141/dc_2008_141_0706+0900_TKC__y30_u.jpg"
httpObj = re.compile(r'TKC_') # pattern to search for in the string
print(httpObj.findall(fi)[0])

If you remove [0] in the print() statement it will get you a list of all occurrences of TKC_ in the string.如果您在print()语句中删除[0] ,它将为您提供字符串中所有TKC_出现的列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM