简体   繁体   中英

Save scraped data csv file from inside docker container to local host

I run a python webscraper to collect articles off various websites, which I then save as csv files. I have been running these manually, but recently have been trying to run them in google cloud shell. I had some trouble with the dependencies, so I decided to build a docker image to run my python scraper

So far, I have managed to create a Dockerfile that I use to build a container with all necessary dependencies.

FROM python:3
# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt
RUN pip install lxml
COPY Fin24 ./Fin24
COPY scraped_list.csv ./scraped_list.csv

# Run fin24.py when the container launches
CMD ["python3", "fin24.py"]

fin24.py contains my scraper. Fin24 is a txt file that holds all the base urls that my scraper crawls for article links, before going into each article and extracting content. scraped_list.csv contains all previous websites I have scraped, which my python script checks to make sure I don't scrape the same article again.

After running the above, I can see it works. The python script stops after all websites it found are scraped. However, I am guessing it is saving the csv file (output) inside the docker container. How could I get it to save it to the directory off of which I am running docker?

Ultimately I want to simply upload the Dockerfile to my Google cloud shell, and run it as a cronjob, and save all output inside the shell. Any help would be much appreciated

You will require to mount that path in your docker deployment. For that you need to do two things: 1. Add a volume in your Dockerfile

WORKDIR /path/in/container
VOLUME ["/path/in/container"]

2. run your container with -v option

docker run -i -t -v /path/on/host:/path/in/container:rw "container name"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM