简体   繁体   中英

How do I retrieve the number of columns in a Pandas data frame?

How do you programmatically retrieve the number of columns in a pandas dataframe? I was hoping for something like:

df.num_columns

Like so:

import pandas as pd
df = pd.DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})

len(df.columns)
3

Alternative:

df.shape[1]

( df.shape[0] is the number of rows)

If the variable holding the dataframe is called df, then:

len(df.columns)

gives the number of columns.

And for those who want the number of rows:

len(df.index)

For a tuple containing the number of both rows and columns:

df.shape

Surprised I haven't seen this yet, so without further ado, here is:

df.columns.size

df.info() function will give you result something like as below. If you are using read_csv method of Pandas without sep parameter or sep with ",".

raw_data = pd.read_csv("a1:\aa2/aaa3/data.csv")
raw_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5144 entries, 0 to 5143
Columns: 145 entries, R_fighter to R_age

There are multiple option to get column number and column information such as:
let's check them.

local_df = pd.DataFrame(np.random.randint(1,12,size=(2,6)),columns =['a','b','c','d','e','f']) 1. local_df.shape[1] --> Shape attribute return tuple as (row & columns) (0,1).

  1. local_df.info() --> info Method will return detailed information about data frame and it's columns such column count, data type of columns, Not null value count, memory usage by Data Frame

  2. len(local_df.columns) --> columns attribute will return index object of data frame columns & len function will return total available columns.

  3. local_df.head(0) --> head method with parameter 0 will return 1st row of df which actually nothing but header.

Assuming number of columns are not more than 10. For loop fun: li_count =0 for x in local_df: li_count =li_count + 1 print(li_count)

这对我有用 len(list(df))。

In order to include the number of row index "columns" in your total shape I would personally add together the number of columns df.columns.size with the attribute pd.Index.nlevels / pd.MultiIndex.nlevels :

Set up dummy data

import pandas as pd

flat_index = pd.Index([0, 1, 2])
multi_index = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), names=["letter", "id"])

columns = ["cat", "dog", "fish"]

data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat_df = pd.DataFrame(data, index=flat_index, columns=columns)
multi_df = pd.DataFrame(data, index=multi_index, columns=columns)

# Show data
# -----------------
# 3 columns, 4 including the index
print(flat_df)
    cat  dog  fish
id                
0     1    2     3
1     4    5     6
2     7    8     9

# -----------------
# 3 columns, 5 including the index
print(multi_df)
           cat  dog  fish
letter id                
a      1     1    2     3
       2     4    5     6
b      1     7    8     9

Writing our process as a function:

def total_ncols(df, include_index=False):
    ncols = df.columns.size
    if include_index is True:
        ncols += df.index.nlevels
    return ncols

print("Ignore the index:")
print(total_ncols(flat_df), total_ncols(multi_df))

print("Include the index:")
print(total_ncols(flat_df, include_index=True), total_ncols(multi_df, include_index=True))

This prints:

Ignore the index:
3 3

Include the index:
4 5

If you want to only include the number of indices if the index is a pd.MultiIndex , then you can throw in an isinstance check in the defined function.

As an alternative, you could use df.reset_index().columns.size to achieve the same result, but this won't be as performant since we're temporarily inserting new columns into the index and making a new index before getting the number of columns.

#use a regular expression to parse the column count
#https://docs.python.org/3/library/re.html

buffer = io.StringIO()
df.info(buf=buffer)
s = buffer.getvalue()
pat=re.search(r"total\s{1}[0-9]\s{1}column",s)
print(s)
phrase=pat.group(0)
value=re.findall(r'[0-9]+',phrase)[0]
print(int(value))
import pandas as pd
df = pd.DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})


print(len(list(df.iterrows())))

gives length of rows

3

[Program finished]

here is:

  • pandas
    • excel engine: xlsxwriter

several method to get column count :

  • len(df.columns) -> 28
    • 在此处输入图片说明
  • df.shape[1] -> 28
    • here: df.shape = (592, 28)
    • related
      • rows count: df.shape[0] -> 592
  • df.columns.shape[0] -> 28
    • here: df.columns.shape = (28,)
      • 在此处输入图片说明
  • df.columns.size -> 28

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM