[英]How to extract specific parts from specific lines from a file?
I have a.yml file that I need to extract a specific part from specific lines.我有一个 .yml 文件,我需要从特定行中提取特定部分。
This is what part of the file looks like (The file is 1200+ lines, but the structure is similar all throughout):这是文件的一部分(文件有 1200 多行,但结构始终相似):
training:
trainings:
- workout: Rec 016
performed_at: 2020-06-25 09:04:16.295000076 Z
star: false
time: '00:04:00'
- workout: Hanging knee raises endurance 10
performed_at: 2020-06-25 08:59:11.871999979 Z
star: true
time: '00:00:28'
repetitions: 10
- workout: Str 700
performed_at: 2020-06-25 08:57:51.039999961 Z
star: true
time: '00:15:30'
- workout: Supermans technical 30
performed_at: 2020-06-25 08:38:45.894000053 Z
star: true
time: '00:01:02'
- workout: Toe touch crunches technical 20
performed_at: 2020-06-25 08:37:05.439000129 Z
star: true
time: '00:00:54'
- workout: Pre 028
performed_at: 2020-06-25 08:35:33.243999958 Z
star: false
time: '00:06:30'
- workout: Rec 001
performed_at: 2020-06-22 22:51:38.947000026 Z
star: false
time: '00:05:01'
- workout: Burpees standard 10
performed_at: 2020-06-22 22:46:00.807000160 Z
star: true
time: '00:00:38'
Extra Info: With the following code:额外信息:使用以下代码:
df = pd.read_csv(r'text_data.yml')
with pd.option_context("display.max_rows", None, "display.max_columns", None):
print(df)
This is what the file transforms into:这是文件转换成的内容:
---
0 training:
1 trainings:
2 - workout: Rec 016
3 performed_at: 2020-06-25 09:04:16.295000076 Z
4 star: false
5 time: '00:04:00'
6 - workout: Hanging knee raises endurance 10
7 performed_at: 2020-06-25 08:59:11.871999979 Z
8 star: true
9 time: '00:00:28'
10 repetitions: 10
11 - workout: Str 700
12 performed_at: 2020-06-25 08:57:51.039999961 Z
13 star: true
14 time: '00:15:30'
15 - workout: Supermans technical 30
16 performed_at: 2020-06-25 08:38:45.894000053 Z
17 star: true
18 time: '00:01:02'
19 - workout: Toe touch crunches technical 20
20 performed_at: 2020-06-25 08:37:05.439000129 Z
21 star: true
22 time: '00:00:54'
23 - workout: Pre 028
24 performed_at: 2020-06-25 08:35:33.243999958 Z
25 star: false
26 time: '00:06:30'
27 - workout: Rec 001
28 performed_at: 2020-06-22 22:51:38.947000026 Z
29 star: false
30 time: '00:05:01'
What I am trying to do is extract the dates (Only the dates, nothing else) that are in the lines that start with "performed_at:" and put them into a list/DataFrame.我想要做的是提取以“performed_at:”开头的行中的日期(只有日期,没有别的)并将它们放入列表/数据帧中。
How would I go about doing this in the most efficient way possible through Pandas?我将如何通过 Pandas 以最有效的方式执行此操作?
For the solution, please read the comments made under the question.有关解决方案,请阅读问题下的评论。 Thanks to @dm2.感谢@dm2。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.