简体   繁体   English

如何在不将S3中存储的文件保存到服务器的情况下进行操作?

[英]How to manipulate files stored in S3 without saving them to the server?

I have the following python script that downloads two files from an S3 compatible service. 我有以下python脚本,可从S3兼容服务下载两个文件。 Then merges them and uploads the output to another bucket. 然后合并它们,并将输出上传到另一个存储桶。

import time
import boto3
import pandas as pd

timestamp = int(time.time())

conn = boto3.client('s3')
conn.download_file('segment', 'segment.csv', 'segment.csv')
conn.download_file('payment', 'payments.csv', 'payments.csv')

paymentsfile = 'payments.csv'
segmentsfile = 'segment.csv'
outputfile = 'payments_merged_' + str(timestamp) + '.csv'

csv_payments = pd.read_csv(paymentsfile, dtype={'ID': float})
csv_segments = pd.read_csv(segmentsfile, dtype={'ID': float})
csv_payments = csv_payments.merge(csv_segments, on='ID')
open(outputfile, 'a').close()
csv_payments.to_csv(outputfile)

conn.upload_file(outputfile, backup, outputfile)

However if I execute the script it stores the files in the folder of my script. 但是,如果执行脚本,它将文件存储在脚本的文件夹中。 For security reasons I would like to prevent this to happen. 出于安全原因,我想防止这种情况的发生。 I could delete the files after the script was executed but let's assume my script is located in the folder /app/script/ . 我可以在执行脚本后删除文件,但是假设我的脚本位于/app/script/文件夹中。 This means for a short time, while the script is being executed, someone could open the url example.com/app/script/payments.csv and download the file. 这意味着在短时间内执行脚本时,有人可以打开URL example.com/app/script/payments.csv并下载文件。 What is a good solution for that? 有什么好的解决方案?

The simplest way would be to modify the configuration of your web server to not serve the directory that you are writing to or write to a directory that isn't served. 最简单的方法是修改Web服务器的配置,使其不提供正在写入的目录或不提供写入的目录。 For example, a common practice is to use /scr for this type of thing. 例如,一种常见的做法是将/ scr用于此类事物。 You would need to modify permissions for the user your web server runs under to ensure it has access to /scr. 您需要为运行Web服务器的用户修改权限,以确保其有权访问/ scr。

To restrict web server access to the directory you write to you can use the following in Nginx - 要限制Web服务器访问您写入的目录,可以在Nginx中使用以下命令-

https://serverfault.com/questions/137907/how-to-restrict-access-to-directory-and-subdirs https://serverfault.com/questions/137907/how-to-restrict-access-to-directory-and-subdirs

For Apache you can use this example - 对于Apache,您可以使用以下示例-

https://serverfault.com/questions/174708/apache2-how-do-i-restrict-access-to-a-directory-but-allow-access-to-one-file-w https://serverfault.com/questions/174708/apache2-how-do-i-restrict-access-to-a-directory-but-allow-allow-access-to-one-file-w

In fact, pandas.read_csv let you read a buffer or byte object. 实际上,pandas.read_csv允许您读取缓冲区或字节对象。 You can do everything in the memory. 您可以完成内存中的所有操作。 Either put this script in a instance, even better, you can run it as AWS lambda process if the file is small. 将此脚本放在一个实例中,甚至更好,如果文件很小,则可以将其作为AWS lambda进程运行。

import time
import boto3
import pandas as pd

paymentsfile = 'payments.csv'
segmentsfile = 'segment.csv'
outputfile = 'payments_merged_' + str(timestamp) + '.csv'

s3 = boto3.client('s3')
payment_obj = s3.get_object(Bucket='payment', Key=paymentsfile )
segment_obj = s3.get_object(Bucket='segment', Key=segmentsfile )

csv_payments = pd.read_csv(payment_obj['Body'], dtype={'ID': float})
csv_segments = pd.read_csv(segments_obj['Body'], dtype={'ID': float})
csv_merge = csv_payments.merge(csv_segments, on='ID')
csv_merge.to_csv(buffer)
buffer.seek(0)

s3.upload_fileobj(buffer, 'bucket_name', outputfile ) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何创建存储在Amazon S3存储桶中的文件列表,并将其显示为下载链接? - How do I create a list of files stored in my Amazon S3 bucket and display them as download links? 如何在django中操作用户上传的文件而不保存它? - How to manipulate user uploaded files in django without saving it? 如何在不使用boto3的情况下从python程序访问存储在s3存储桶中的文件? - How to access files stored in s3 bucket from python program without use of boto3? 如何使用pdfminer从存储在S3存储桶中的PDF文件中提取文本而不下载到本地? - How to use pdfminer to extract text from PDF files stored in S3 bucket without downloading it locally? 将文件保存在S3子文件夹中 - Saving files in a S3 subfolder 如何上传多个文件并在 S3 中组合它们 - How to upload multiple files and combine them in S3 如何在不使用pydub将其保存在Flask中的情况下操作音频 - How to manipulate audio without saving it in Flask with pydub AWS S3 + Django:存储在S3中的文件可由用户访问 - AWS S3 + Django: Files stored in S3 accesible by user 处理存储在云中的文件(S3或Spaces) - Processing files stored on cloud (S3 or Spaces) 在熊猫中访问存储在s3上的HDF文件 - Access HDF files stored on s3 in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM