简体   繁体   中英

How to read multiple files from different folder in python

I have yearly data files in different folders. each file contains daily data ranging from Jan 1 to Dec 31. Data files name is looks like AS060419.67 where last four digit represent year ie 1967 and 0604 is folder name.

I tried to read these multiple files by using the code (below), but it reads only for last year data in last folder

def date_parser(doy, year):    
    return dt.datetime.strptime(doy.zfill(3)+year, '%j%Y')

files = glob.glob('????/AS*')
files.sort()
files
STNS = {}
for f in files:
    stn_id, info = f.split('/')
    year = "".join(info[-5:].split('.'))
    #print (f,stn_id)
    with open(f) as fo:                  
        data = fo.readlines()[:-1]
        data = [d.strip() for d in data]
        data = '\n'.join(data)
        with open('data.dump', 'w') as dump:
            dump.write(data)

parser = lambda date: date_parser(date, year=year)
df = pd.read_table('data.dump', delim_whitespace=True,names=['date','prec'], 
                   na_values='DNA', parse_dates=[0], date_parser=parser, index_col='date' ) 

df.replace({'T': 0})
df = df.apply(pd.to_numeric, args=('coerce',))
df.name = stn_name
df.sid = stn_id

if stn_id not in STNS.keys():
    STNS[stn_name] = df

else:
    STNS[stn_id] = STNS[stn_id].append(df)
    STNS[stn_id].name = df.name
    STNS[stn_id].sid = df.sid
    #outfile.write(line)

For making plot

for stn in STNS:
    STNS[stn_id].plot()
    plt.title('Precipitation for {0}'.format(STNS[stn].name))

The problem is it reads only last year data in last folder. Can anyone help to figure out this problem.Your help will be highly appreciated.

You overwrite the same file over and over again. Derive your target file name from your source file name. Or use the append mode if you want it all in the same file.

How do you append to a file?

You can do it like this:

import os
import glob
import pandas as pd
import matplotlib.pyplot as plt

# file mask
fmask = r'./data/????/AS*.??'

# all RegEx replacements
replacements = {
  r'T': 0
}

# list of data files
flist = glob.glob(fmask)


def read_data(flist, date_col='date', **kwargs):
    dfs = []
    for f in flist:
        # parse year from the file name
        y = os.path.basename(f).replace('.', '')[-4:]
        df = pd.read_table(f, **kwargs)
        # replace day of year with a date
        df[date_col] = pd.to_datetime(y + df[date_col].astype(str).str.zfill(3), format='%Y%j')
        dfs.append(df)
    return pd.concat(dfs, ignore_index=True)


df = read_data(flist,
               date_col='date',
               sep=r'\s+',
               header=None,
               names=['date','prec'],
               engine='python',
               skipfooter=1,
              ) \
     .replace(replacements, regex=True) \
     .set_index('date') \
     .apply(pd.to_numeric, args=('coerce',))


df.plot()

plt.show()

I've downloaded only four files, so the corresponding data you can see on the plot...

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM