简体   繁体   中英

How can one count the rows and columns of multiple data-frames?

I have 3 data-frames:

d1 = {'col1': [1, 2], 'col2': [3, 4]}
d2 = {'col1': [1,2,3], 'col2': [3,4,5]}
d3 = {'col1': [1,2,3,4,5], 'col2': [3,4,5,6,7]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
df3 = pd.DataFrame(data=d3)

Now i'm trying to count the amount of rows and columns of these 3 data-frames and place it in a new data-frame named my_dataframe . This is the code I used:

dataframes = [df1, df2, df3]
number_rows = [df.shape[0] for df in dataframes]
number_columns = [df.shape[1] for df in dataframes]

my_data = {'df': dataframes, 'rows': number_rows, 'columns': number_columns}

my_dataframe = pd.DataFrame(my_data)

print(my_dataframe)

This is my output:

在此处输入图片说明

This is my expected output:

    df   -   rows   -   columns      
0   df1  -   2      -   2
1   df2  -   3      -   2
2   df3  -   5      -   2

Can someone explain me what went wrong and how I can fix this? Thank you all.

In the line where you define the data to be inserted into my_data , you are inadvertently inserting the original dataframes themselves rather than their names.

my_data = {'df': dataframes, 'rows': number_rows, 'columns': number_columns}

Instead define df_names = ['df1', 'df2', 'df3'] and use this as value in my_data in the place of dataframes .

I don't think there is a nice, in-built way in Pandas to get the name of a dataframe. (I could be wrong, though.)

Better is use dicts:

dataframes = {'df1': df1, 'df2':df2, 'df3':df3}

number_rows = [df.shape[0] for k, df in dataframes.items()]
number_columns = [df.shape[1] for k, df in dataframes.items()]
names = list(dataframes.keys())


my_data = {'df': names, 'rows': number_rows, 'columns': number_columns}

my_dataframe = pd.DataFrame(my_data)

print(my_dataframe)
    df  rows  columns
0  df1     2        2
1  df2     3        2
2  df3     5        2

Or:

dataframes = {'df1': df1, 'df2':df2, 'df3':df3}

my_dataframe = pd.DataFrame([(k, df.shape[0], df.shape[1]) for k, df in dataframes.items()],
                            columns=['df','rows','columns'])

print(my_dataframe)
    df  rows  columns
0  df1     2        2
1  df2     3        2
2  df3     5        2

It is possible, but need inspect lib for this:

dataframes = [df1, df2, df3]

import inspect

#https://stackoverflow.com/a/40536047
def retrieve_name(var):
        """
        Gets the name of var. Does it from the out most frame inner-wards.
        :param var: variable to get name from.
        :return: string
        """
        for fi in reversed(inspect.stack()):
            names = [var_name for var_name, var_val in fi.frame.f_locals.items() if var_val is var]
            if len(names) > 0:
                return names[0]

number_rows = [df.shape[0] for df in dataframes]
number_columns = [df.shape[1] for df in dataframes]
names = [retrieve_name(x) for x in dataframes]

my_data = {'df': names, 'rows': number_rows, 'columns': number_columns}

my_dataframe = pd.DataFrame(my_data)
print(my_dataframe)
    df  rows  columns
0  df1     2        2
1  df2     3        2
2  df3     5        2

You can try:

d = pd.DataFrame([{'df': k, 'rows': v.shape[0], 'cols': v.shape[1]}
                  for k, v in zip(('df1', 'df2', 'df2'), (df1, df2, df3))])

print(d)

    df  rows  cols
0  df1     2     2
1  df2     3     2
2  df2     5     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM