简体   繁体   中英

Exploratory Data Analysis with Pandas

How can i group continuous data (like a tenure column showing different months for each row) into categories shown in a separate column using pandas

Are you looking for something like this?:

import datetime

import pandas as pd

# Make fake data
dates = {"tenure": [datetime.date(2020, 1, 31), datetime.date(2020, 1, 24), datetime.date(2020, 5, 13),
                    datetime.date(2021, 5, 23), datetime.date(2022, 5, 5), datetime.date(2020, 3, 16),
                    datetime.date(2020, 5, 28), datetime.date(2020, 9, 23), datetime.date(2020, 12, 28),
                    datetime.date(2021, 10, 12)]}
df = pd.DataFrame(data=dates)
tenure
2020-01-31
2020-01-24
2020-05-13
2021-05-23
2022-05-05
2020-03-16
2020-05-28
2020-09-23
2020-12-28
2021-10-12
# Make months to group by
df["tenure"] = pd.to_datetime(df.tenure)
df["month"] = df.tenure.dt.month_name()
tenure month
2020-01-31 00:00:00 January
2020-01-24 00:00:00 January
2020-05-13 00:00:00 May
2021-05-23 00:00:00 May
2022-05-05 00:00:00 May
2020-03-16 00:00:00 March
2020-05-28 00:00:00 May
2020-09-23 00:00:00 September
2020-12-28 00:00:00 December
2021-10-12 00:00:00 October
# Group by months and show "different months for each row"
df = (df
      .sort_values("tenure")
      .groupby("month")["tenure"]
      .apply(lambda x: x.reset_index(drop=True))
      .unstack()
      .reset_index())
month 0 1 2 3
December 2020-12-28 00:00:00 NaT NaT NaT
January 2020-01-24 00:00:00 2020-01-31 00:00:00 NaT NaT
March 2020-03-16 00:00:00 NaT NaT NaT
May 2020-05-13 00:00:00 2020-05-28 00:00:00 2021-05-23 00:00:00 2022-05-05 00:00:00
October 2021-10-12 00:00:00 NaT NaT NaT
September 2020-09-23 00:00:00 NaT NaT NaT

OR perhaps something like this?:

# Group by months and show "different months for each row"
df = (df.sort_values("tenure")
        .groupby("month")["tenure"]
        .apply(lambda x: x.reset_index(drop=True))
        .unstack()
        .reset_index()
        .T)
df = df.rename(columns=df.iloc[0]).drop(df.index[0]).reset_index(drop=True)
December January March May October September
2020-12-28 00:00:00 2020-01-24 00:00:00 2020-03-16 00:00:00 2020-05-13 00:00:00 2021-10-12 00:00:00 2020-09-23 00:00:00
NaT 2020-01-31 00:00:00 NaT 2020-05-28 00:00:00 NaT NaT
NaT NaT NaT 2021-05-23 00:00:00 NaT NaT
NaT NaT NaT 2022-05-05 00:00:00 NaT NaT

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM