I have a class like below. I am wondering what is the most pythonic way to declare and initialize multiple empty dataframes?
import pandas as pd
class ReadData:
def __init__(self, input_dir):
self.df1 = pd.DataFrame(data=None)
self.df2 = pd.DataFrame(data=None)
self.df3 = pd.DataFrame(data=None)
self.input_dir = input_dir
def read_inputs():
self.df1 = pd.read_csv(self.input_dir+"/file1.csv")
self.df2 = pd.read_csv(self.input_dir+"/file2.csv")
self.df3 = pd.read_csv(self.input_dir+"/file3.csv")
ReadData("./").read_inputs()
In general, dataframes are not supposed to be initialized empty and appended to (appending to dataframes is a slow memory intensive operation). You'll be better off storing your data in structures that can append data quickly such as a list
.
However, to answer your question, you can use a dictionary comprehension and keep your dataframes in a dictionary. Or you can do the same with a list.
import pandas as pd
class Data:
def __init__(self):
self.dfs = {
"df{}".format(i): pd.DataFrame(data=None)
for i in range(3)
}
Then you can access your data likeso:
data = Data()
data.dfs["df1"]
Though the power of using a dictionary is that you can explicitly name your data. So a structure like this may be more intuitive:
class Data:
def __init__(self, df_names):
self.dfs = {
name: pd.DataFrame(data=None) for name in df_names
}
data = Data(df_names=["df1", "better_named_df", "averages"])
# accessing underlying frames
data.dfs["df1"]
data.dfs["better_named_df"]
Another approach using a list-comprehension instead of a dictionary:
import pandas as pd
class Data:
def __init__(self):
self.dfs = [pd.DataFrame(data=None) for _ in range(3)]
data = Data()
data.dfs[0]
data.dfs[1]
Since you specified that you're just reading in these dataframes to run different queries against them, I wouldn't recommend a class at all. This is because there no common functionality that you're going to run against each dataframe, aside from reading them into memory. A function that returns a dictionary should suffice:
import pathlib
import pandas as pd
def read_data(base_dir, file_names):
dataframes = {}
base_dir = pathlib.Path(base_dir)
for fname in file_names:
fpath = base_dir / fname
dataframes[fpath.stem] = pd.read_csv(fpath)
return dataframes
# you can call this function like so:
dfs = read_data("./", ["file1.csv", "file2.csv", "file3.csv"])
# frames is a dictionary with this structure:
# {"file1": dataframe from file1.csv,
# "file2": dataframe from file2.csv,
# "file3": dataframe from file3.csv}
# access data like this
dfs["file1"]
If you are intent on having each DataFrame be an attribute you can take advantage of setattr
.
class Data:
def __init__(self, n):
for num in range(1, n + 1):
setattr(self, f"df{num}", pd.DataFrame())
Then whatever number you supply to the constructor, you would have that many DataFrame attributes on the object.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.