简体   繁体   English

Python 3: How to upload a pandas dataframe as a csv stream without saving on disc?

[英]Python 3: How to upload a pandas dataframe as a csv stream without saving on disc?

I want to upload a pandas dataframe to a server as csv file without saving it on the disc.我想将 pandas dataframe 作为 csv 文件上传到服务器而不将其保存在光盘上。 Is there a way to create a more or less "fake csv" file which pretends to be a real file?有没有办法创建一个或多或少的“假 csv”文件,它假装是一个真实的文件?

Here is some example code: First I get my data from a sql query and storing it as a dataframe.这是一些示例代码:首先,我从 sql 查询中获取数据并将其存储为 dataframe。 In the upload_ga_data function I want to have something with this logic在upload_ga_data function 我想有这个逻辑的东西

 media = MediaFileUpload('df',
                      mimetype='application/octet-stream',
                      resumable=False)

Full example:完整示例:

from __future__ import print_function
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.errors import HttpError
from apiclient.http import MediaFileUpload
import pymysql
import pandas as pd
con = x

ga_query = """
    SELECT XXXXX
    """

df = pd.read_sql_query(ga_query,con)

df.to_csv('ga_export.csv', sep=',', encoding='utf-8', index = False)

def upload_ga_data():
    try:
        media = MediaFileUpload('ga_export.csv',
                          mimetype='application/octet-stream',
                          resumable=False)
        daily_upload = service.management().uploads().uploadData(
                accountId=accountId,
                webPropertyId=webPropertyId,
                customDataSourceId=customDataSourceId,
                media_body=media).execute()
        print ("Upload was successfull")
    except TypeError as error:
      # Handle errors in constructing a query.
      print ('There was an error in constructing your query : %s' % error)

The required behavior is possible using stream : 使用可以实现所需的行为:

to create a more or less "fake csv" file which pretends to be a real file 创建一个或多或少的“假csv”文件,假装是一个真正的文件

Python makes File Descriptor (with open ) and Stream (with io.StringIO ) behave similarly. Python使文件描述符 (带有open )和 (带有io.StringIO )的行为类似。 Then anywhere you can use a file descriptor can also use a String Stream. 然后,您可以使用文件描述符的任何地方也可以使用字符串流。

The easiest way to create a text stream is with open(), optionally specifying an encoding: 创建文本流的最简单方法是使用open(),可选择指定编码:

 f = open("myfile.txt", "r", encoding="utf-8") 

In-memory text streams are also available as StringIO objects: 内存中的文本流也可用作StringIO对象:

 f = io.StringIO("some initial text data") 

The text stream API is described in detail in the documentation of TextIOBase. 文本流API在TextIOBase的文档中有详细描述。

In Pandas you can do it with any function having path_or_buf argument in its signature , such as to_csv : 在Pandas中,您可以使用其签名中具有path_or_buf参数的任何函数来执行此操作,例如to_csv

DataFrame.to_csv( path_or_buf =None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\\n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True, escapechar=None, decimal='.') DataFrame.to_csv( path_or_buf =None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\\n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True, escapechar=None, decimal='.')

Following code exports a dummy DataFrame in CSV format into a String Stream (not physical file, in-memory octet-stream): 以下代码将CSV格式的伪数据帧导出为字符串流(非物理文件,内存中的八位字节流):

import io
import pandas as pd

df = pd.DataFrame(list(range(10)))

stream = io.StringIO()
df.to_csv(stream, sep=";")

When you want to get access to the stream content, just issue: 如果您想要访问流内容,只需发出:

>>> stream.getvalue()
';0\n0;0\n1;1\n2;2\n3;3\n4;4\n5;5\n6;6\n7;7\n8;8\n9;9\n'

It returns the content without having the need to use a real file. 它返回内容而无需使用真实文件。

Though the other answer is an excellent start, there may be some who are confused on how to complete op's whole task.尽管另一个答案是一个很好的开始,但可能有些人对如何完成 op 的整个任务感到困惑。 Here is a way to go from writing a dataframe to a stream to preparing that stream for upload using Google apiclient.http module. Here is a way to go from writing a dataframe to a stream to preparing that stream for upload using Google apiclient.http module. A key difference from op's attempt is that I pass the stream itself to a MediaIOBaseUpload instead of a MediaFileUpload.与 op 尝试的一个主要区别是我将 stream 本身传递给 MediaIOBaseUpload 而不是 MediaFileUpload。 The file is assumed to be utf-8 like OP's file.该文件被假定为 utf-8 就像 OP 的文件一样。 This runs fine for me until the media is being uploaded, then I have an error " self._fp.write(s.encode('ascii', 'surrogateescape')) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 2313: ordinal not in range(128)"这对我来说运行良好,直到媒体被上传,然后我有一个错误“ self._fp.write(s.encode('ascii', 'surrogateescape')) UnicodeEncodeError: 'ascii' codec can't encode character '\ position 2313 中的 xe9':序号不在范围内 (128)"

import io
import pandas as pd
from googleapiclient.errors import HttpError

from apiclient.http import MediaIOBaseUpload  # Changed this from MediaFileUpload

df = pd.DataFrame(list(range(10)))

stream = io.StringIO()
# writing df to the stream instead of a file:
df.to_csv(stream, sep=',', encoding='utf-8', index = False)
try:
    media = MediaIOBaseUpload(stream,
                          mimetype='application/octet-stream',
                          resumable=False)

#### Your upload logic here using media just created ####

except HttpError as error:

    #### Handle your errors in uploading here ####

Because I have a unicode character, I developed the alternative code which accomplishes the same thing but can handle the unicode characters.因为我有一个 unicode 字符,所以我开发了替代代码,它完成了同样的事情,但可以处理 unicode 字符。

import io
import pandas as pd
from googleapiclient.errors import HttpError

from apiclient.http import MediaIOBaseUpload  # Changed this from MediaFileUpload

df = pd.DataFrame(list(range(10)))

records = df.to_csv(line_terminator='\r\n', index=False).encode('utf-8')
bytes = io.BytesIO(records)

try:
    media = MediaIOBaseUpload(bytes,
                          mimetype='application/octet-stream',
                          resumable=False)

#### Your upload logic here using media just created ####

except HttpError as error:

    #### Handle your errors in uploading here ####

I used:我用了:

from googleapiclient.http import MediaIoBaseUpload

versus @Katherine's:与@Katherine 相比:

from apiclient.http import MediaIOBaseUpload 

But other than that, @Katherine's alternative solution worked perfectly for me as I was developing a solution to write a dataframe to a csv file in Google Drive running from a Google Cloud Function.但除此之外,@Katherine 的替代解决方案非常适合我,因为我正在开发一种解决方案,将 dataframe 写入从 Google Cloud Z86408593C34AF727FDD90DF932FZ8B 运行的 Google Drive 中的 csv 文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 FastAPI stream DataFrame 而不将数据保存到 csv 文件? - How to stream DataFrame using FastAPI without saving the data to csv file? 将 Pandas Dataframe 转换为镜像并直接上传到 S3 存储桶,无需本地保存使用 Python - Convert Pandas Dataframe to Image and Upload Directly to S3 Bucket Without Saving Locally Using Python 在python 2中将数据框保存为CSV - saving dataframe to CSV in python 2 Python Pandas:在将 dataframe 保存为 Z628CB5675FF524F3E719B7AA28 之前创建文件名 - Python Pandas: Creating filename before saving dataframe as csv 将 pandas dataframe 上传到 azure blob 而不创建 Z628CB5675FF524F3E719BFEAAZ8 本地文件 - upload pandas dataframe to azure blob without creating csv local file 将CSV数据流转换为Pandas DataFrame(Python 2.7) - Convert CSV Data Stream into Pandas DataFrame (Python 2.7) 熊猫和csv如何在不保存的情况下将csv创建为字符串 - pandas and csv how to create csv as a string without saving Python:如何从大熊猫数据帧创建多个 CSV 而不复制已创建的 CSV 中的记录 - Python : How to create multiple CSV from the large pandas dataframe without duplicating the records in CSV's created python Firebase 云存储 - 上传 DataFrame 而不保存 - python Firebase Cloud Storage - upload DataFrame without saving it Python:Pandas DataFrame转换为CSV - Python : Pandas DataFrame to CSV
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM