Read files from s3 bucket that match a pattern in python

Question

I am reading a file from s3 in pandas.

aws_credentials = { 
                    "key": "xxxx", 
                    "secret": "xxxx" 
                  }

# Read data from S3 
df_aln = pd.read_csv("s3://dir/ABC/fname_0521.csv", storage_options=aws_credentials, encoding='latin-1')

However, I have several files with same shape and similar naming convention fname_mmyy . How do I read all the files that match the naming pattern and combine them into one pandas DataFrame?

I'd prefer to not write pd.read_csv to read each file separately.

Answer 1

According to this answer: https://stackoverflow.com/a/69568591/687896 , you can use glob on S3. Your pattern would be something like fname_*.csv :

# get the list of CSV files (from cited answer):
import s3fs
s3 = s3fs.S3FileSystem(anon=False)
csvs = s3.glob('your/s3/path/to/fname*.csv')

# read them into pandas + concat the dfs
dfs = []
for csv in csvs:
    df = pandas.read_csv(csv)
    dfs.append(df)

df = pandas.concat(dfs)

That (or something along those lines) should work.

Read files from s3 bucket that match a pattern in python

Question

1 answers

solution1
1 2022-11-15 23:34:35

Read files from s3 bucket that match a pattern in python

Question

1 answers

solution1 1 2022-11-15 23:34:35

solution1
1 2022-11-15 23:34:35