简体   繁体   中英

Changing the date format of an entire Dataframe column when multiple date formats already exist in the column?

bond_df['Maturity']

0     2022-07-15 00:00:00
1     2024-07-18 00:00:00
2     2027-07-16 00:00:00
3     2020-07-28 00:00:00
4     2019-10-09 00:00:00
5     2022-04-08 00:00:00
6     2020-12-15 00:00:00
7     2022-12-15 00:00:00
8     2026-04-08 00:00:00
9     2023-04-11 00:00:00
10    2024-12-15 00:00:00
11                   2019
12    2020-10-25 00:00:00
13    2024-04-22 00:00:00
14    2047-12-15 00:00:00
15    2020-07-08 00:00:00
17    2043-04-11 00:00:00
18                   2021
19                   2022
20                   2023
21                   2025
22                   2026
23                   2027
24                   2029
25    2021-04-15 00:00:00
26    2044-04-22 00:00:00
27    2043-10-02 00:00:00
28    2039-01-19 00:00:00
29    2040-07-09 00:00:00
30    2029-09-21 00:00:00
31    2040-10-25 00:00:00
32                   2019
33    2035-09-04 00:00:00
34    2035-09-28 00:00:00
35    2041-04-15 00:00:00
36    2040-04-02 00:00:00
37    2034-03-27 00:00:00
38                   2030
39    2027-04-05 00:00:00
40    2038-04-15 00:00:00
41    2037-08-17 00:00:00
42    2023-10-16 00:00:00
43                      -
45    2019-10-09 00:00:00
46                      -
47    2021-06-23 00:00:00
48    2021-06-23 00:00:00
49    2023-06-26 00:00:00
50    2025-06-26 00:00:00
51    2028-06-26 00:00:00
52    2038-06-28 00:00:00
53    2020-06-23 00:00:00
54    2020-06-23 00:00:00
55    2048-06-29 00:00:00
56                      -
57                      -
58    2029-07-08 00:00:00
59    2026-07-08 00:00:00
60    2024-07-08 00:00:00
61    2020-07-31 00:00:00
Name: Maturity, dtype: object

This is a column of data that I imported from Excel of maturity dates for various Walmart bonds. All I am concerned with is the year portion of these dates. How can I format the entire column to just return the year values?

dt.strftime didn't work

Thanks in advance

Wrote this little script for you which should output the years in a years.txt file, assuming your data is in data.txt as only the years you posted above.

Script also lets you toggle if you want to include the dash and the years on the right.

Contents of the data.txt I tested with:

0     2022-07-15 00:00:00
1     2024-07-18 00:00:00
2     2027-07-16 00:00:00
3     2020-07-28 00:00:00
4     2019-10-09 00:00:00
5     2022-04-08 00:00:00
6     2020-12-15 00:00:00
7     2022-12-15 00:00:00
8     2026-04-08 00:00:00
9     2023-04-11 00:00:00
10    2024-12-15 00:00:00
11                   2019
12    2020-10-25 00:00:00
13    2024-04-22 00:00:00
14    2047-12-15 00:00:00
15    2020-07-08 00:00:00
17    2043-04-11 00:00:00
18                   2021
19                   2022
20                   2023
21                   2025
22                   2026
23                   2027
24                   2029
25    2021-04-15 00:00:00
26    2044-04-22 00:00:00
27    2043-10-02 00:00:00
28    2039-01-19 00:00:00
29    2040-07-09 00:00:00
30    2029-09-21 00:00:00
31    2040-10-25 00:00:00
32                   2019
33    2035-09-04 00:00:00
34    2035-09-28 00:00:00
35    2041-04-15 00:00:00
36    2040-04-02 00:00:00
37    2034-03-27 00:00:00
38                   2030
39    2027-04-05 00:00:00
40    2038-04-15 00:00:00
41    2037-08-17 00:00:00
42    2023-10-16 00:00:00
43                      -
45    2019-10-09 00:00:00
46                      -
47    2021-06-23 00:00:00
48    2021-06-23 00:00:00
49    2023-06-26 00:00:00
50    2025-06-26 00:00:00
51    2028-06-26 00:00:00
52    2038-06-28 00:00:00
53    2020-06-23 00:00:00
54    2020-06-23 00:00:00
55    2048-06-29 00:00:00
56                      -
57                      -
58    2029-07-08 00:00:00
59    2026-07-08 00:00:00
60    2024-07-08 00:00:00
61    2020-07-31 00:00:00

and the script I wrote:

#!/usr/bin/python3
all_years = []
include_dash = False
include_years_on_right = True
with open("data.txt", "r") as f:
    text = f.read()
    lines = text.split("\n")

for line in lines:
    line = line.strip()
    if line == "":
        continue
    if "00" in line:
        all_years.append(line.split("-")[0].split()[-1])
    else:
        if include_years_on_right == False:
            continue
        year = line.split(" ")[-1]
        if year == "-":
            if include_dash == True:
                all_years.append(year)
            else:
                continue
        else:
            all_years.append(year)
with open("years.txt", "w") as f:
    for year in all_years:
        f.write(year + "\n")

and the output to the years.txt:

2022
2024
2027
2020
2019
2022
2020
2022
2026
2023
2024
2019
2020
2024
2047
2020
2043
2021
2022
2023
2025
2026
2027
2029
2021
2044
2043
2039
2040
2029
2040
2019
2035
2035
2041
2040
2034
2030
2027
2038
2037
2023
2019
2021
2021
2023
2025
2028
2038
2020
2020
2048
2029
2026
2024
2020

Contact me if you have any issues, and I hope I can help you!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM