简体   繁体   中英

Python: how to automatically find a line and replace a value in it?

I have a data frame that looks like this:

>>> df
           Date  Name    1st Column                 2nd Column           3rd Column
0    2021/05/01  A              0.0                        0.0         1.573127e+06
1    2021/07/01  A              0.0                        0.0         1.507486e+06
2    2023/05/01  A              0.0                        0.0         1.317854e+06
3    2016/08/01  A              0.0                        0.0         0.000000e+00
4    2016/11/01  A              0.0                        0.0         0.000000e+00
..          ...               ...           ...                        ...                  ...
160  2019/08/01  A              0.0                        0.0         1.621895e+06
161  2021/01/01  A              0.0                        0.0         1.693617e+06
162  2021/10/01  A              0.0                        0.0         1.479616e+06
163  2025/02/01  A              0.0                        0.0         1.296158e+06
164  2025/06/01  A              0.0                        0.0         1.325505e

[165 rows x 5 columns]

and I want to replace some zeros with these sorted tab-separated-values from a text file:

Date 1/2019 2/2019  3/2019  4/2019  5/2019  6/2019  7/2019  8/2019  9/2019  10/2019 11/2019 12/2019 1/2020  2/2020  3/2020  4/2020  5/2020  6/2020  7/2020  8/2020  9/2020  10/2020 11/2020 12/2020 1/2021  2/2021  3/2021  4/2021  5/2021  6/2021  7/2021  8/2021  9/2021  10/2021 11/2021 12/2021 1/2022  2/2022  3/2022  4/2022  5/2022  6/2022  7/2022  8/2022  9/2022  10/2022 11/2022 12/2022 1/2023  2/2023  3/2023  4/2023  5/2023  6/2023  7/2023  8/2023  9/2023  10/2023 11/2023 12/2023 1/2024  2/2024  3/2024  4/2024  5/2024  6/2024  7/2024  8/2024  9/2024  10/2024 11/2024 12/2024 1/2025  2/2025  3/2025  4/2025  5/2025  6/2025  7/2025  8/2025  9/2025  10/2025 11/2025 12/2025 1/2026
1st Column 3,197423109  3,199271438 3,201119768 3,205836429 3,210549655 3,139294108 3,044097425 2,948900742 2,855464295 2,842043348 2,849479962 2,856916577 2,864353191 2,87182361  2,879294029 2,883960121 2,888617532 2,969237657 3,073817093 3,172887513 3,270197696 3,24771483  3,196074523 3,149663844 3,079303752 3,035528235 2,995261308 2,919925073 2,860230995 2,800496835 2,740882913 2,710733322 2,680583731 2,690211691 2,708139529 2,72083774  2,757485364 2,768058092 2,775122231 2,81794729  2,844999222 2,872025753 2,899128487 2,81798565  2,713110718 2,608235786 2,50512109  2,474120803 2,463978077 2,453760035 2,443541992 2,419794799 2,396097461 2,372425051 2,348677859 2,424831471 2,524717178 2,619093869 2,711710323 2,704387968 2,67253926  2,645701347 2,60987964  2,598712878 2,591032768 2,55817305  2,525288737 2,491362789 2,457436841 2,415412747 2,373388653 2,359188961 2,34865806  2,356650046 2,373625828 2,379772984 2,382327375 2,410008316 2,421711325 2,41884542  2,415979515 2,350753715 2,277362479 2,203971243 2,132340243
2nd Column -550000  -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000 -550000

without messing up the dates. eg:

>>> df_new.sort_values(["Date"])
           Date  Name    1st Column                 2nd Column           3rd Column
100  2012/04/01  A              0.0                        0.0         0.000000e+00
139  2012/05/01  A              0.0                        0.0         0.000000e+00
105  2012/06/01  A              0.0                        0.0         0.000000e+00
78   2012/07/01  A              0.0                        0.0         0.000000e+00
16   2012/08/01  A              0.0                        0.0         0.000000e+00
..          ...               ...           ...                        ...                  ...
45   2025/08/01  A       2,41884542                    -550000         1.330365e+06
46   2025/09/01  A      2,415979515                    -550000         1.328789e+06
32   2025/10/01  A      2,350753715                    -550000         1.292915e+06
152  2025/11/01  A      2,277362479                    -550000         1.252549e+06
8    2025/12/01  A      2,203971243                    -550000         1.212184e+06

[165 rows x 5 columns]

Note that the date are in different formats YYYY/MM/DD vs. m/YYYY .

How can this be done? Thanks!

I can easily extend the tab-separated-values to cover the 166 months and also add the leading zeros to the months for it to be MM/YYYY ... if that makes the solution easier.

Idea is create DataFrame with DatetimeIndex in columns and then transpose by DataFrame.T :

df1 = pd.read_csv(file, sep="\t", index_col=[0])

df1.columns = pd.to_datetime(df1.columns)
df1 = df1.T
print (df1)
Date          1stColumn 2ndColumn
2019-01-01  3,197423109   -550000
2019-02-01  3,199271438   -550000
2019-03-01  3,201119768   -550000
2019-04-01  3,205836429   -550000
2019-05-01  3,210549655   -550000
                ...       ...
2025-09-01  2,415979515   -550000
2025-10-01  2,350753715   -550000
2025-11-01  2,277362479   -550000
2025-12-01  2,203971243   -550000
2026-01-01  2,132340243   -550000

[85 rows x 2 columns]

Then create DatetimeIndex in original data, replace 0 values by values from second DataFrame and then use DataFrame.fillna for replace non matched values to 0 :

df = df.set_index('Date')
df.index = pd.to_datetime(df.index)

df = df.mask(df.eq(0), df1).fillna(0)
print (df)
           Name    1stColumn 2ndColumn     3rdColumn
Date                                                
2021-05-01    A  2,860230995   -550000  1.573127e+06
2021-07-01    A  2,740882913   -550000  1.507486e+06
2023-05-01    A  2,396097461   -550000  1.317854e+06
2016-08-01    A            0         0  0.000000e+00
2016-11-01    A            0         0  0.000000e+00
2019-08-01    A  2,948900742   -550000  1.621895e+06
2021-01-01    A  3,079303752   -550000  1.693617e+06
2021-10-01    A  2,690211691   -550000  1.479616e+06
2025-02-01    A  2,356650046   -550000  1.296158e+06
2025-06-01    A  2,410008316   -550000     1.325505e

EDIT: If want same format like data in df first convert datetimes in columns in df1 and then use DatetimeIndex.strftime for format YYYY/MM/DD :

df1 = pd.read_csv(file, sep="\t", index_col=[0])

df1.columns = pd.to_datetime(df1.columns).strftime('%Y/%m/%d')
df1 = df1.T
print (df1)
Date          1stColumn 2ndColumn
2019/01/01  3,197423109   -550000
2019/02/01  3,199271438   -550000
2019/03/01  3,201119768   -550000
2019/04/01  3,205836429   -550000
2019/05/01  3,210549655   -550000
                ...       ...
2025/09/01  2,415979515   -550000
2025/10/01  2,350753715   -550000
2025/11/01  2,277362479   -550000
2025/12/01  2,203971243   -550000
2026/01/01  2,132340243   -550000

[85 rows x 2 columns]

df = df.mask(df.eq(0), df1).fillna(0).reset_index()
print (df)
         Date Name    1stColumn 2ndColumn     3rdColumn
0  2021/05/01    A  2,860230995   -550000  1.573127e+06
1  2021/07/01    A  2,740882913   -550000  1.507486e+06
2  2023/05/01    A  2,396097461   -550000  1.317854e+06
3  2016/08/01    A            0         0  0.000000e+00
4  2016/11/01    A            0         0  0.000000e+00
5  2019/08/01    A  2,948900742   -550000  1.621895e+06
6  2021/01/01    A  3,079303752   -550000  1.693617e+06
7  2021/10/01    A  2,690211691   -550000  1.479616e+06
8  2025/02/01    A  2,356650046   -550000  1.296158e+06
9  2025/06/01    A  2,410008316   -550000     1.325505e

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM