简体   繁体   中英

Read files from s3 bucket that match a pattern in python

I am reading a file from s3 in pandas.

aws_credentials = { 
                    "key": "xxxx", 
                    "secret": "xxxx" 
                  }

# Read data from S3 
df_aln = pd.read_csv("s3://dir/ABC/fname_0521.csv", storage_options=aws_credentials, encoding='latin-1')

However, I have several files with same shape and similar naming convention fname_mmyy . How do I read all the files that match the naming pattern and combine them into one pandas DataFrame?

I'd prefer to not write pd.read_csv to read each file separately.

According to this answer: https://stackoverflow.com/a/69568591/687896 , you can use glob on S3. Your pattern would be something like fname_*.csv :

# get the list of CSV files (from cited answer):
import s3fs
s3 = s3fs.S3FileSystem(anon=False)
csvs = s3.glob('your/s3/path/to/fname*.csv')

# read them into pandas + concat the dfs
dfs = []
for csv in csvs:
    df = pandas.read_csv(csv)
    dfs.append(df)

df = pandas.concat(dfs)

That (or something along those lines) should work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM