Creating differently named data frames in a for loop - Python?

Question

I am working on creating a data frame for the data I have for each year (1971-2017). I have a for loop that creates the data frame, but it is all in one. How would I make it so that it creates a separate df for each year? Below is what I currently have.

for years in range(1971,2017):
        df = pd.read_csv('gene_%4.4d.txt'%years, sep='|', header=None, names=['PubMed ID','Title','Abstract','Affiliations','Pub Year','Pub Month','Pub Day','Journal'])

Answer 1

You are overwriting the df variable each time you read in a new file. To avoid this, I'd suggest initializing a list outside of the loop, and storing each new DataFrame in it:

all_dfs = []

for years in range(1971, 2017):
    df = pd.read_csv('gene_%4.4d.txt' % years, sep='|', header=None, names=['PubMed ID', 'Title', 'Abstract', 'Affiliations', 'Pub Year', 'Pub Month', 'Pub Day', 'Journal'])
    all_dfs.append(df)

Now all_dfs is a list of all the DataFrames. (A common thing to do next is to combine them all into one large DataFrame, eg pd.concat(all_dfs) )

Creating differently named data frames in a for loop - Python?

Question

1 answers

solution1
1 ACCPTED 2020-09-07 18:52:58

Creating differently named data frames in a for loop - Python?

Question

1 answers

solution1 1 ACCPTED 2020-09-07 18:52:58

solution1
1 ACCPTED 2020-09-07 18:52:58