[英]How to extract strings using regex pattern in python?
我正在尝试通过使用正则表达式 - python 中的“str.extract”来提取文件名,这些文件名是时间之后和 before.filetype 之后的文本。
你可以试试:
# 5 fields /fname/ext
df['filename'] = df['text'].str.extract(r'(?:\w+ ){5}(.*)\.[^.]*$')
output:
index text filename
0 1 sample 1 root root 349802 Nov 1 2000 introduction.json* Nov 1 2000 introduction
1 2 sample 1 root root 1234 Oct 1 10:26 test_housing.csv Oct 1 10:26 test_housing
2 3 sample 1 root root 5983025 Nov 1 10:32 test_train_housing.csv Nov 1 10:32 test_train_housing
3 4 sample 1 root root 1252 Oct 1 10:32 _test.csv Oct 1 10:32 _test
4 5 sample 1 root root 938 Oct 1 10:32 _train_small.csv Oct 1 10:32 _train_small
5 6 sample 1 root root 9909303 Oct 5 2000 README.md* Oct 5 2000 README
df['filename'] = df['text'].str.extract('(\w+)[.].*$')
结果:
['introduction', 'test_housing', 'test_train_housing', '_test', '_train_small', 'README']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.