I have 3 data-frames:
d1 = {'col1': [1, 2], 'col2': [3, 4]}
d2 = {'col1': [1,2,3], 'col2': [3,4,5]}
d3 = {'col1': [1,2,3,4,5], 'col2': [3,4,5,6,7]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
df3 = pd.DataFrame(data=d3)
Now i'm trying to count the amount of rows and columns of these 3 data-frames and place it in a new data-frame named my_dataframe
. This is the code I used:
dataframes = [df1, df2, df3]
number_rows = [df.shape[0] for df in dataframes]
number_columns = [df.shape[1] for df in dataframes]
my_data = {'df': dataframes, 'rows': number_rows, 'columns': number_columns}
my_dataframe = pd.DataFrame(my_data)
print(my_dataframe)
This is my output:
This is my expected output:
df - rows - columns
0 df1 - 2 - 2
1 df2 - 3 - 2
2 df3 - 5 - 2
Can someone explain me what went wrong and how I can fix this? Thank you all.
In the line where you define the data to be inserted into my_data
, you are inadvertently inserting the original dataframes themselves rather than their names.
my_data = {'df': dataframes, 'rows': number_rows, 'columns': number_columns}
Instead define df_names = ['df1', 'df2', 'df3']
and use this as value in my_data
in the place of dataframes
.
I don't think there is a nice, in-built way in Pandas to get the name of a dataframe. (I could be wrong, though.)
Better is use dicts:
dataframes = {'df1': df1, 'df2':df2, 'df3':df3}
number_rows = [df.shape[0] for k, df in dataframes.items()]
number_columns = [df.shape[1] for k, df in dataframes.items()]
names = list(dataframes.keys())
my_data = {'df': names, 'rows': number_rows, 'columns': number_columns}
my_dataframe = pd.DataFrame(my_data)
print(my_dataframe)
df rows columns
0 df1 2 2
1 df2 3 2
2 df3 5 2
Or:
dataframes = {'df1': df1, 'df2':df2, 'df3':df3}
my_dataframe = pd.DataFrame([(k, df.shape[0], df.shape[1]) for k, df in dataframes.items()],
columns=['df','rows','columns'])
print(my_dataframe)
df rows columns
0 df1 2 2
1 df2 3 2
2 df3 5 2
It is possible, but need inspect
lib for this:
dataframes = [df1, df2, df3]
import inspect
#https://stackoverflow.com/a/40536047
def retrieve_name(var):
"""
Gets the name of var. Does it from the out most frame inner-wards.
:param var: variable to get name from.
:return: string
"""
for fi in reversed(inspect.stack()):
names = [var_name for var_name, var_val in fi.frame.f_locals.items() if var_val is var]
if len(names) > 0:
return names[0]
number_rows = [df.shape[0] for df in dataframes]
number_columns = [df.shape[1] for df in dataframes]
names = [retrieve_name(x) for x in dataframes]
my_data = {'df': names, 'rows': number_rows, 'columns': number_columns}
my_dataframe = pd.DataFrame(my_data)
print(my_dataframe)
df rows columns
0 df1 2 2
1 df2 3 2
2 df3 5 2
You can try:
d = pd.DataFrame([{'df': k, 'rows': v.shape[0], 'cols': v.shape[1]}
for k, v in zip(('df1', 'df2', 'df2'), (df1, df2, df3))])
print(d)
df rows cols
0 df1 2 2
1 df2 3 2
2 df2 5 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.