I have three boolean series with a date (daily interval), which need to be merged into one column with a certain string as a value in a dataframe. The value can be for now the naqme of the series, like "bool_series_1"
, "bool_series_2"
... The dataframe has a loose time grid, so the values need also a normalization to map on.
Let's say this is bool_series_1:
2021-07-06 False
2021-07-07 True
2021-07-08 False
bool_series_2:
2021-07-06 True
2021-07-07 False
2021-07-08 False
bool_series_3:
2021-07-06 False
2021-07-07 False
2021-07-08 True
Dataframe:
datetime somevalues
2021-07-06 15:30:19 17445
2021-07-06 15:33:26 5327
2021-07-06 15:35:00 2222
2021-07-06 15:36:00 10104
2021-07-06 15:38:15 5444
2021-07-07 15:30:43 83524
2021-07-07 15:33:22 5327
2021-07-07 15:35:01 4918
2021-07-07 15:36:04 10104
2021-07-07 15:38:15 5214
2021-07-08 15:31:13 55543
2021-07-08 15:33:22 4234
2021-07-08 15:34:31 1321
2021-07-08 15:35:44 19876
2021-07-08 15:37:55 6453
I need this output:
datetime somevalues new_col
2021-07-06 15:30:19 17445 bool_series_2
2021-07-06 15:33:26 5327 bool_series_2
2021-07-06 15:35:00 2222 bool_series_2
2021-07-06 15:36:00 10104 bool_series_2
2021-07-06 15:38:15 5444 bool_series_2
2021-07-07 15:30:43 83524 bool_series_1
2021-07-07 15:33:22 5327 bool_series_1
2021-07-07 15:35:01 4918 bool_series_1
2021-07-07 15:36:04 10104 bool_series_1
2021-07-07 15:38:15 5214 bool_series_1
2021-07-08 15:31:13 55543 bool_series_3
2021-07-08 15:33:22 4234 bool_series_3
2021-07-08 15:34:31 1321 bool_series_3
2021-07-08 15:35:44 19876 bool_series_3
2021-07-08 15:37:55 6453 bool_series_3
If it would be only one series I'd do it like this:
df['new_col'] = np.where(df.index.normalize().map(bool_series_1 == True), 'bool_series_1', False)
But this won't work repeatedly. Any ideas how to solve this?
You can do it like this:
mapdate = pd.concat(
[
bool_series_1.map({True: "bool_series_1"}),
bool_series_2.map({True: "bool_series_2"}),
bool_series_3.map({True: "bool_series_3"}),
]
)
mapdate = mapdate.dropna()
mapdate.index = pd.to_datetime(mapdate.index)
df['new_col'] = df.index.normalize().map(mapdate)
Output:
somevalues new_col
datetime
2021-07-06 15:30:19 17445 bool_series_2
2021-07-06 15:33:26 5327 bool_series_2
2021-07-06 15:35:00 2222 bool_series_2
2021-07-06 15:36:00 10104 bool_series_2
2021-07-06 15:38:15 5444 bool_series_2
2021-07-07 15:30:43 83524 bool_series_1
2021-07-07 15:33:22 5327 bool_series_1
2021-07-07 15:35:01 4918 bool_series_1
2021-07-07 15:36:04 10104 bool_series_1
2021-07-07 15:38:15 5214 bool_series_1
2021-07-08 15:31:13 55543 bool_series_3
2021-07-08 15:33:22 4234 bool_series_3
2021-07-08 15:34:31 1321 bool_series_3
2021-07-08 15:35:44 19876 bool_series_3
2021-07-08 15:37:55 6453 bool_series_3
Here's a way to do what your question asks:
from functools import reduce
from operator import add
df = df.assign(norm=df.index.normalize()).reset_index().set_index('norm', drop=True)
df['new_col'] = (
reduce(add, (ser.map({True:name, False:''}) for ser, name in zip(
[bool_series_1, bool_series_2, bool_series_3],
['bool_series_1', 'bool_series_2', 'bool_series_3'])
)))
df = df.set_index('datetime')
Explanation:
DatetimeIndex.normalize()
to convert times to midnight in the large dataframe for matching with the dates in each boolean seriesboolean series, string value
tuples by calling zip()
for parallel lists of the boolean series and their corresponding string values ( "bool_series_1", "bool_series_2"...
)Series.map()
to create a generator that yields a transformed version of each series with True
replaced by the string value corresponding to the series and False
replaced by the empty string ''
reduce()
and add()
to concatenate the string values across all transformed series resulting in a series containing the desired string value for each rownew_col
set_index()
to restore the original index.Input:
df
somevalues
datetime
2021-07-06 15:30:19 17445
2021-07-06 15:33:26 5327
2021-07-06 15:35:00 2222
2021-07-06 15:36:00 10104
2021-07-06 15:38:15 5444
2021-07-07 15:30:43 83524
2021-07-07 15:33:22 5327
2021-07-07 15:35:01 4918
2021-07-07 15:36:04 10104
2021-07-07 15:38:15 5214
2021-07-08 15:31:13 55543
2021-07-08 15:33:22 4234
2021-07-08 15:34:31 1321
2021-07-08 15:35:44 19876
2021-07-08 15:37:55 6453
bool_series_1
2021-07-06 False
2021-07-07 True
2021-07-08 False
Name: 1, dtype: bool
bool_series_2
2021-07-06 True
2021-07-07 False
2021-07-08 False
Name: 1, dtype: bool
bool_series_3
2021-07-06 False
2021-07-07 False
2021-07-08 True
Name: 1, dtype: bool
Output:
somevalues new_col
datetime
2021-07-06 15:30:19 17445 bool_series_2
2021-07-06 15:33:26 5327 bool_series_2
2021-07-06 15:35:00 2222 bool_series_2
2021-07-06 15:36:00 10104 bool_series_2
2021-07-06 15:38:15 5444 bool_series_2
2021-07-07 15:30:43 83524 bool_series_1
2021-07-07 15:33:22 5327 bool_series_1
2021-07-07 15:35:01 4918 bool_series_1
2021-07-07 15:36:04 10104 bool_series_1
2021-07-07 15:38:15 5214 bool_series_1
2021-07-08 15:31:13 55543 bool_series_3
2021-07-08 15:33:22 4234 bool_series_3
2021-07-08 15:34:31 1321 bool_series_3
2021-07-08 15:35:44 19876 bool_series_3
2021-07-08 15:37:55 6453 bool_series_3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.