从文件路径中提取所需的目录名称

Question

I have the path-name of a file as follows:我有一个文件的路径名，如下所示：

fi = "http://pen.jamstec.go.jp/TKC_/public_html/original/dc/dc_2008/dc_2008_141/dc_2008_141_0706+0900_TKC__y30_u.jpg"

How can I extract "TKC_" as the required information?如何提取"TKC_"作为所需信息？

I tried as print (os.path.basename(fi)) but it printed:我尝试print (os.path.basename(fi))但它打印：

dc_2008_141_0706+0900_TKC__y30_u.jpg

"http://pen.jamstec.go.jp/**TKC_**/public_html/original/dc/dc_2008/dc_2008_141/dc_2008_141_0706+0900_TKC__y30_u.jpg"

Answer 1

Try尝试

from urllib.parse import urlparse

fi = "http://pen.jamstec.go.jp/TKC_/public_html/original/dc/dc_2008/dc_2008_141/dc_2008_141_0706+0900_TKC__y30_u.jpg"
urlparse(fi).path.split('/')[1]

Answer 2

Try using regex:尝试使用正则表达式：

import re
fi = "http://pen.jamstec.go.jp/TKC_/public_html/original/dc/dc_2008/dc_2008_141/dc_2008_141_0706+0900_TKC__y30_u.jpg"
result = re.search(r'http://.*?/(.*?)/',fi)
print(result.group(1))

Answer 3

You can use Regex to define a pattern to search for in a string, which is TKC_ in this case.您可以使用Regex定义要在字符串中搜索的模式，在本例中为TKC_ 。

import re
fi = "http://pen.jamstec.go.jp/TKC_/public_html/original/dc/dc_2008/dc_2008_141/dc_2008_141_0706+0900_TKC__y30_u.jpg"
httpObj = re.compile(r'TKC_') # pattern to search for in the string
print(httpObj.findall(fi)[0])

If you remove [0] in the print() statement it will get you a list of all occurrences of TKC_ in the string.如果您在print()语句中删除[0] ，它将为您提供字符串中所有TKC_出现的列表。

从文件路径中提取所需的目录名称

问题描述

3 个解决方案

解决方案1
2 已采纳 2019-12-29 07:50:34

解决方案2
0 2019-12-29 07:46:05

解决方案3
0 2019-12-29 07:56:14

从文件路径中提取所需的目录名称

问题描述

3 个解决方案

解决方案1 2 已采纳 2019-12-29 07:50:34

解决方案2 0 2019-12-29 07:46:05

解决方案3 0 2019-12-29 07:56:14

解决方案1
2 已采纳 2019-12-29 07:50:34

解决方案2
0 2019-12-29 07:46:05

解决方案3
0 2019-12-29 07:56:14