简体   繁体   中英

Writing contents of s3 to CSV

I am in the process of creating a script that grabs my s3 data to my local machine. Typically the data I am receiving is that of a hive partition. I am receiving a No such file or directory error even though the file does exist. Can someone explain what I am doing wrong and how I should approach this differently? Here is the piece of code that the error references:

bucket = conn.get_bucket(bucket_name)
for sub in bucket.list(prefix = 'some_prefix'):
        matched = re.search(re.compile(read_key_pattern), sub.name)
        if matched:
            with open(sub.name, 'rb') as fin:
                reader = csv.reader(fin, delimiter = '\x01')
                contents = [line for line in reader]
            with open('output.csv', 'wb') as fout:
                writer = csv.writer(fout, quotechar = '', quoting = csv.QUOTE_NONE, escapechar = '\\')
                writer.writerows.content

IOError: [Errno 2] No such file or directory: 'my_prefix/54c91e35-4dd0-4da6-a7b7-283dff0f4483-000000'

The file exists and that is the correct folder and file that I am trying to retrieve.

Like @roganjosh said, it looks like you haven't downloaded the file after you tested for the name match. I've added comments below to show you how to process the file in-memory in python 2:

    from io import StringIO # alternatively use BytesIO
    import contextlib

    bucket = conn.get_bucket(bucket_name)
    # use re.compile outside of the for loop
    # it has slightly better performance characteristics
    matcher = re.compile(read_key_pattern)

    for sub in bucket.list(prefix = 'some_prefix'):
        # bucket.list returns an iterator over s3.Key objects
        # so we can use `sub` directly as the Key object
        matched = matcher.search(sub.name)
        if matched:
            
                # read straight from the memory buffer
                reader = csv.reader(fp, delimiter = '\x01')
                contents = [line for line in reader]
            with open('output.csv', 'wb') as fout:
                writer = csv.writer(fout, quotechar = '', quoting = csv.QUOTE_NONE, escapechar = '\\')
                writer.writerows.content    

For python 3 you will need to change the with statement as discussed in the comments to the answer for this question .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM