简体   繁体   中英

Writing csv files to local host from a docker container

I am trying to set up a very basic data processing project where I use docker to create an ubuntu environment on an EC2, install python, take an input csv, perform some simple data manipulation, then output the data to a new csv in the folder where the input was. I have been able to successfully run my python code locally, as well as on the ec2, but when I run it with the docker container, the data appears to be processed (my script prints out the data), but the results not saved at the end of the run. Is there a command I am missing from my dockerfile that is causing the results not to be saved? Alternatively, is there a way I can save the output directly to an S3 bucket?

EDIT: The the path to the input files is "/home/ec2-user/docker_test/data" and the path to the code is "/home/ec2-user/docker_test/code". After the data is processed, I want the result to be written as a new file in the "/home/ec2-user/docker_test/data" directory on the host.

Dockerfile:

FROM ubuntu:latest

RUN apt-get update \
    && apt-get install -y --no-install-recommends software-properties-common \
    && add-apt-repository -y ppa:deadsnakes/ppa \
    && apt-get update \
    && apt-get install -q -y --no-install-recommends python3.6 python3.6-dev python3-pip python3-setuptools \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

VOLUME /home/ec2-user/docker_test/data
VOLUME /home/ec2-user/docker_test/code

WORKDIR /home/ec2-user/docker_test/

COPY requirements.txt ./

RUN cat requirements.txt | xargs -n 1 -L 1 python3.6 -m pip install --no-cache-dir

COPY . .

ENV LC_ALL C.UTF-8
ENV LANG=C.UTF-8

CMD python3.6 main.py

Python Script:

import pandas as pd
import os
from code import processing

path = os.getcwd()

def main():
    df = pd.read_csv(path + '/data/table.csv')
    print('input df: \n{}'.format(df))
    df_out = processing.processing(df)
    df_out.to_csv(path + '/data/updated_table.csv', index = False)
    print('\noutput df: \n{}'.format(df_out))


if __name__ == '__main__':
    main()

EDIT: I have been running the dockerfile with "docker run docker_test"

You could use S3FS Fuse to mount the S3 bucket as a drive in your docker container. This basically creates a folder on your filesystem that is actually the S3 bucket. Anything that you save/modify in that folder will be reflected in the S3 bucket.

If you delete the docker container or unmount the drive you still have your S3 bucket intact, so you don't need to worry too much about erasing files in the S3 bucket through normal docker use.

Ok, gotcha, with the edit about expectations of the CSV being output to the host , we do have a problem with how this is set up.

You've got two VOLUMEs declared in your Dockerfile, which is fine. These are named volumes , which are great for persisting data between containers going up and down on a single host, but you aren't able to easily just go in like it's a normal file system from your host.

If you want the file to show up on your host, you can create a bind mounted volume at runtime, which maps a path in your host filesystem to a path in the Docker container's filesystem.

docker run -v $(pwd):/home/ec2-user/docker_test/data docker_test will do this. $(pwd) is an expression that evaluates to your current working directory if you're on a *nix system, where you're running the command. Take care with that and adjust as needed (like if you're using Windows as your host).

With a volume set up this way, when the CSV is created in the container file system at the location you intend, it will be accessible on your host in the location relative to however you've mapped it.

Read up on volumes . They're vital to using Docker, not hard to grasp at first glance, but there a some gotchas in the details.


Regarding uploading to S3, I would recommend using the boto3 library and doing it in your Python script. You could also use something like s3cmd if you find that simpler.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM