简体   繁体   English

从 python 中的文件名中提取 ID 和日期

[英]Extract ID and Date from file name in python

I have this filename as source of data of my dataframe我将此文件名作为我的 dataframe 的数据源

file_name = 2900-ABC Project-20210525-Data 1

and I want to get the 4 first number as a new column called ID and also the date in the filename as the new column called event_date.我想将第 4 个数字作为一个名为ID的新列,并将文件名中的日期作为名为 event_date 的新列。

The expected results would be:预期结果将是:

id     event_date
2900   2021-05-25

How can I get it in python?如何在 python 中获得它?

Thanks in advance.提前致谢。

Barring regular expressions, this can be done withstr.split() :除了正则表达式,这可以通过str.split()来完成:

import datetime as dt
import pandas as pd

file_name = '2900-ABC Project-20210525-Data 1'

file_split = file_name.split('-')
id_value = int(file_split[0])
date = dt.datetime.strptime(file_split[2], '%Y%m%d').date()

df = pd.DataFrame(data={'id': [id_value], 'event_date': [date]})

Using str.extract and str.replace :使用str.extractstr.replace

df["id"] = df["file_name"].str.extract(r'^(\d+)')
df["event_date"] = df["file_name"].str.replace(r'^.*-(\d{4})(\d{2})(\d{2})-.*$', r'\1-\2-\3')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM