I have multiple csv files with similar names in numeric order (nba_1, nba_2, etc). They are all formatted the same as far as column names and dtypes. Instead of manually pulling each one in individually to a dataframe (nba_1 = pd.read_csv('/nba_1.csv'))
is there a way to write a for
loop or something like it to pull them in and name them? I think the basic framework would be something like:
for i in range(1, 6):
nba_i = pd.read_csv('../nba_i.csv')
Beyond that, I do not know the particulars. Once I pull them in I will be performing the same actions on each of them (deleting and formating the same columns) so I would also want to iterate through them there.
Thank you in advance for your help.
csv
files are the same, as you stated in the question, it would be more efficient to combine them all into a single dataframe and then clean the data all at once.
from pathlib import Path
import pandas as pd
p = Path(r'c:\some_path_to_files') # set your path
files = p.glob('nba*.csv') # find your files
# It was stated, all the files are the same format, so create one dataframe
df = pd.concat([pd.read_csv(file) for file in files])
[pd.read_csv(file) for file in files]
is a list comprehension, which creates a dataframe of each file. pd.concat
combines all the files in the list dict
of dataframes key
of the dict
will be a filename p = Path(r'c:\some_path_to_files') # set your path
files = p.glob('nba*.csv') # find your files
df_dict = dict()
for file in files:
df_dict[file.stem] = pd.read_csv(file)
df_dict
: df_dict.keys() # to show you all the keys
df_dict[filename] # to access a specific dataframe
# after cleaning the individual dataframes in df_dict, they can be combined
df_final = pd.concat([value for value in df_dict.values()])
在Pandas上构建的Dask库具有将多个csv一次加载到单个数据帧的方法。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.