I have a data frame that looks like this:
>>> df
Date Name 1st Column 2nd Column 3rd Column
0 2021/05/01 A 0.0 0.0 1.573127e+06
1 2021/07/01 A 0.0 0.0 1.507486e+06
2 2023/05/01 A 0.0 0.0 1.317854e+06
3 2016/08/01 A 0.0 0.0 0.000000e+00
4 2016/11/01 A 0.0 0.0 0.000000e+00
.. ... ... ... ... ...
160 2019/08/01 A 0.0 0.0 1.621895e+06
161 2021/01/01 A 0.0 0.0 1.693617e+06
162 2021/10/01 A 0.0 0.0 1.479616e+06
163 2025/02/01 A 0.0 0.0 1.296158e+06
164 2025/06/01 A 0.0 0.0 1.325505e
[165 rows x 5 columns]
and I want to replace some zeros with these sorted tab-separated-values from a text file:
Date 1/2019 2/2019 3/2019 4/2019 5/2019 6/2019 7/2019 8/2019 9/2019 10/2019 11/2019 12/2019 1/2020 2/2020 3/2020 4/2020 5/2020 6/2020 7/2020 8/2020 9/2020 10/2020 11/2020 12/2020 1/2021 2/2021 3/2021 4/2021 5/2021 6/2021 7/2021 8/2021 9/2021 10/2021 11/2021 12/2021 1/2022 2/2022 3/2022 4/2022 5/2022 6/2022 7/2022 8/2022 9/2022 10/2022 11/2022 12/2022 1/2023 2/2023 3/2023 4/2023 5/2023 6/2023 7/2023 8/2023 9/2023 10/2023 11/2023 12/2023 1/2024 2/2024 3/2024 4/2024 5/2024 6/2024 7/2024 8/2024 9/2024 10/2024 11/2024 12/2024 1/2025 2/2025 3/2025 4/2025 5/2025 6/2025 7/2025 8/2025 9/2025 10/2025 11/2025 12/2025 1/2026
1st Column 3,197423109 3,199271438 3,201119768 3,205836429 3,210549655 3,139294108 3,044097425 2,948900742 2,855464295 2,842043348 2,849479962 2,856916577 2,864353191 2,87182361 2,879294029 2,883960121 2,888617532 2,969237657 3,073817093 3,172887513 3,270197696 3,24771483 3,196074523 3,149663844 3,079303752 3,035528235 2,995261308 2,919925073 2,860230995 2,800496835 2,740882913 2,710733322 2,680583731 2,690211691 2,708139529 2,72083774 2,757485364 2,768058092 2,775122231 2,81794729 2,844999222 2,872025753 2,899128487 2,81798565 2,713110718 2,608235786 2,50512109 2,474120803 2,463978077 2,453760035 2,443541992 2,419794799 2,396097461 2,372425051 2,348677859 2,424831471 2,524717178 2,619093869 2,711710323 2,704387968 2,67253926 2,645701347 2,60987964 2,598712878 2,591032768 2,55817305 2,525288737 2,491362789 2,457436841 2,415412747 2,373388653 2,359188961 2,34865806 2,356650046 2,373625828 2,379772984 2,382327375 2,410008316 2,421711325 2,41884542 2,415979515 2,350753715 2,277362479 2,203971243 2,132340243
2nd Column -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000
without messing up the dates. eg:
>>> df_new.sort_values(["Date"])
Date Name 1st Column 2nd Column 3rd Column
100 2012/04/01 A 0.0 0.0 0.000000e+00
139 2012/05/01 A 0.0 0.0 0.000000e+00
105 2012/06/01 A 0.0 0.0 0.000000e+00
78 2012/07/01 A 0.0 0.0 0.000000e+00
16 2012/08/01 A 0.0 0.0 0.000000e+00
.. ... ... ... ... ...
45 2025/08/01 A 2,41884542 -550000 1.330365e+06
46 2025/09/01 A 2,415979515 -550000 1.328789e+06
32 2025/10/01 A 2,350753715 -550000 1.292915e+06
152 2025/11/01 A 2,277362479 -550000 1.252549e+06
8 2025/12/01 A 2,203971243 -550000 1.212184e+06
[165 rows x 5 columns]
Note that the date are in different formats YYYY/MM/DD
vs. m/YYYY
.
How can this be done? Thanks!
I can easily extend the tab-separated-values to cover the 166 months and also add the leading zeros to the months for it to be MM/YYYY
... if that makes the solution easier.
Idea is create DataFrame with DatetimeIndex
in columns and then transpose by DataFrame.T
:
df1 = pd.read_csv(file, sep="\t", index_col=[0])
df1.columns = pd.to_datetime(df1.columns)
df1 = df1.T
print (df1)
Date 1stColumn 2ndColumn
2019-01-01 3,197423109 -550000
2019-02-01 3,199271438 -550000
2019-03-01 3,201119768 -550000
2019-04-01 3,205836429 -550000
2019-05-01 3,210549655 -550000
... ...
2025-09-01 2,415979515 -550000
2025-10-01 2,350753715 -550000
2025-11-01 2,277362479 -550000
2025-12-01 2,203971243 -550000
2026-01-01 2,132340243 -550000
[85 rows x 2 columns]
Then create DatetimeIndex
in original data, replace 0
values by values from second DataFrame and then use DataFrame.fillna
for replace non matched values to 0
:
df = df.set_index('Date')
df.index = pd.to_datetime(df.index)
df = df.mask(df.eq(0), df1).fillna(0)
print (df)
Name 1stColumn 2ndColumn 3rdColumn
Date
2021-05-01 A 2,860230995 -550000 1.573127e+06
2021-07-01 A 2,740882913 -550000 1.507486e+06
2023-05-01 A 2,396097461 -550000 1.317854e+06
2016-08-01 A 0 0 0.000000e+00
2016-11-01 A 0 0 0.000000e+00
2019-08-01 A 2,948900742 -550000 1.621895e+06
2021-01-01 A 3,079303752 -550000 1.693617e+06
2021-10-01 A 2,690211691 -550000 1.479616e+06
2025-02-01 A 2,356650046 -550000 1.296158e+06
2025-06-01 A 2,410008316 -550000 1.325505e
EDIT: If want same format like data in df
first convert datetimes in columns in df1
and then use DatetimeIndex.strftime
for format YYYY/MM/DD
:
df1 = pd.read_csv(file, sep="\t", index_col=[0])
df1.columns = pd.to_datetime(df1.columns).strftime('%Y/%m/%d')
df1 = df1.T
print (df1)
Date 1stColumn 2ndColumn
2019/01/01 3,197423109 -550000
2019/02/01 3,199271438 -550000
2019/03/01 3,201119768 -550000
2019/04/01 3,205836429 -550000
2019/05/01 3,210549655 -550000
... ...
2025/09/01 2,415979515 -550000
2025/10/01 2,350753715 -550000
2025/11/01 2,277362479 -550000
2025/12/01 2,203971243 -550000
2026/01/01 2,132340243 -550000
[85 rows x 2 columns]
df = df.mask(df.eq(0), df1).fillna(0).reset_index()
print (df)
Date Name 1stColumn 2ndColumn 3rdColumn
0 2021/05/01 A 2,860230995 -550000 1.573127e+06
1 2021/07/01 A 2,740882913 -550000 1.507486e+06
2 2023/05/01 A 2,396097461 -550000 1.317854e+06
3 2016/08/01 A 0 0 0.000000e+00
4 2016/11/01 A 0 0 0.000000e+00
5 2019/08/01 A 2,948900742 -550000 1.621895e+06
6 2021/01/01 A 3,079303752 -550000 1.693617e+06
7 2021/10/01 A 2,690211691 -550000 1.479616e+06
8 2025/02/01 A 2,356650046 -550000 1.296158e+06
9 2025/06/01 A 2,410008316 -550000 1.325505e
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.