简体   繁体   中英

pandas .loc returns inconsistent types

Say I have 2 dataframes

df1= pd.DataFrame(["2020-12-31","2021-01-01"],columns={"date"},index=['23845940781720275',"23845940781720275"])

and

df2 = pd.DataFrame(["2020-12-31"],columns={"date"},index=["23845940781720275"])

I want to get a way to enumerate the items in the "date" column for both cases:

  • An index spans on several dates (df1)
  • An index contains a unique date (df2)

When I try the following solutions I get inconsistent results

> type(df1.loc["23845940781720275"]["date"])

<class 'pandas.core.series.Series'>

> df1.loc["23845940781720275"]["date"]

23845940781720275    2020-12-31
23845940781720275    2021-01-01
Name: date, dtype: object

> type(df2.loc["23845940781720275"]["date"])

<class 'str'>

> df2.loc["23845940781720275"]["date"]

'2020-12-31'

I found some posts saying to use the df.loc[x][['column']] to always get a DataFrame, but then when I use it, I get the same level of inconsistency

> type(df1.loc["23845940781720275"][["date"]])

<class 'pandas.core.frame.DataFrame'>

> type(df2.loc["23845940781720275"][["date"]])

<class 'pandas.core.series.Series'>

My IRL use case is made easier and more readable using pandas , any fix?

I guess it's because the second dataframe contains only 1 row and 1 column.

In the second case type(df2.loc["23845940781720275"][["date"]]) -

you converted str to series. It's not a dataframe as it still contains only one column pointing to single series.

If you want to remove the inconsistency then use -

type(df2.loc[["23845940781720275"]][["date"]]) # pandas.core.frame.DataFrame

To fetch a list of dates for an index use -

df2.loc[["23845940781720275"]][["date"]]["date"].values.tolist()

Here's a (dirty) fix to extract the dates from the dataframes:

[date_str.strip() for date_str in df1.loc[['23845940781720275']][['date']].to_string().strip().split(f"\n{'23845940781720275'}")[1:]]

returns: ['2020-12-31', '2021-01-01']

[date_str.strip() for date_str in df2.loc[['23845940781720275']][['date']].to_string().strip().split(f"\n{'23845940781720275'}")[1:]]

returns: ['2020-12-31']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM