简体   繁体   中英

Python Pandas dataFrame - Columns selection

I have a Pandas dataFrame object train_df with say a column called "ColA" and a column "ColB". It has been loaded from a csv file with columns header using read_csv

I obtain the same results when I code:

pd.crosstab(train_df['ColA'], train_df['ColB'])

or

pd.crosstab(train_df.ColA, train_df.ColB)

Is there any difference in these 2 ways of selecting columns?

When I request to print the type it's the same: pandas.core.series.Series

No difference

pd.crosstab(train_df['ColA'], train_df['ColB']) is recommended to prevent possible errors.

For example, if you have a column named count and if you type train_df.count it will give an error. train_df['count'] won't give an error.

If you only want to select a single column, there is no difference between the two ways.

However, the dot notation doesn't allow you to select multiple columns, whereas you can use dataframe[['col1', 'col2']] to select multiple columns (which returns a pandas.core.frame.DataFrame instead of a pandas.core.series.Series ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM