[英]Python 3: How to upload a pandas dataframe as a csv stream without saving on disc?
I want to upload a pandas dataframe to a server as csv file without saving it on the disc.我想将 pandas dataframe 作为 csv 文件上传到服务器而不将其保存在光盘上。 Is there a way to create a more or less "fake csv" file which pretends to be a real file?
有没有办法创建一个或多或少的“假 csv”文件,它假装是一个真实的文件?
Here is some example code: First I get my data from a sql query and storing it as a dataframe.这是一些示例代码:首先,我从 sql 查询中获取数据并将其存储为 dataframe。 In the upload_ga_data function I want to have something with this logic
在upload_ga_data function 我想有这个逻辑的东西
media = MediaFileUpload('df',
mimetype='application/octet-stream',
resumable=False)
Full example:完整示例:
from __future__ import print_function
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.errors import HttpError
from apiclient.http import MediaFileUpload
import pymysql
import pandas as pd
con = x
ga_query = """
SELECT XXXXX
"""
df = pd.read_sql_query(ga_query,con)
df.to_csv('ga_export.csv', sep=',', encoding='utf-8', index = False)
def upload_ga_data():
try:
media = MediaFileUpload('ga_export.csv',
mimetype='application/octet-stream',
resumable=False)
daily_upload = service.management().uploads().uploadData(
accountId=accountId,
webPropertyId=webPropertyId,
customDataSourceId=customDataSourceId,
media_body=media).execute()
print ("Upload was successfull")
except TypeError as error:
# Handle errors in constructing a query.
print ('There was an error in constructing your query : %s' % error)
The required behavior is possible using stream : 使用流可以实现所需的行为:
to create a more or less "fake csv" file which pretends to be a real file
创建一个或多或少的“假csv”文件,假装是一个真正的文件
Python makes File Descriptor (with open
) and Stream (with io.StringIO
) behave similarly. Python使文件描述符 (带有
open
)和流 (带有io.StringIO
)的行为类似。 Then anywhere you can use a file descriptor can also use a String Stream. 然后,您可以使用文件描述符的任何地方也可以使用字符串流。
The easiest way to create a text stream is with open(), optionally specifying an encoding:
创建文本流的最简单方法是使用open(),可选择指定编码:
f = open("myfile.txt", "r", encoding="utf-8")
In-memory text streams are also available as StringIO objects:
内存中的文本流也可用作StringIO对象:
f = io.StringIO("some initial text data")
The text stream API is described in detail in the documentation of TextIOBase.
文本流API在TextIOBase的文档中有详细描述。
In Pandas you can do it with any function having path_or_buf
argument in its signature , such as to_csv
: 在Pandas中,您可以使用其签名中具有
path_or_buf
参数的任何函数来执行此操作,例如to_csv
:
DataFrame.to_csv(
path_or_buf
=None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\\n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True, escapechar=None, decimal='.')
DataFrame.to_csv(
path_or_buf
=None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\\n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True, escapechar=None, decimal='.')
Following code exports a dummy DataFrame in CSV format into a String Stream (not physical file, in-memory octet-stream): 以下代码将CSV格式的伪数据帧导出为字符串流(非物理文件,内存中的八位字节流):
import io
import pandas as pd
df = pd.DataFrame(list(range(10)))
stream = io.StringIO()
df.to_csv(stream, sep=";")
When you want to get access to the stream content, just issue: 如果您想要访问流内容,只需发出:
>>> stream.getvalue()
';0\n0;0\n1;1\n2;2\n3;3\n4;4\n5;5\n6;6\n7;7\n8;8\n9;9\n'
It returns the content without having the need to use a real file. 它返回内容而无需使用真实文件。
Though the other answer is an excellent start, there may be some who are confused on how to complete op's whole task.尽管另一个答案是一个很好的开始,但可能有些人对如何完成 op 的整个任务感到困惑。 Here is a way to go from writing a dataframe to a stream to preparing that stream for upload using Google apiclient.http module.
Here is a way to go from writing a dataframe to a stream to preparing that stream for upload using Google apiclient.http module. A key difference from op's attempt is that I pass the stream itself to a MediaIOBaseUpload instead of a MediaFileUpload.
与 op 尝试的一个主要区别是我将 stream 本身传递给 MediaIOBaseUpload 而不是 MediaFileUpload。 The file is assumed to be utf-8 like OP's file.
该文件被假定为 utf-8 就像 OP 的文件一样。 This runs fine for me until the media is being uploaded, then I have an error " self._fp.write(s.encode('ascii', 'surrogateescape')) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 2313: ordinal not in range(128)"
这对我来说运行良好,直到媒体被上传,然后我有一个错误“ self._fp.write(s.encode('ascii', 'surrogateescape')) UnicodeEncodeError: 'ascii' codec can't encode character '\ position 2313 中的 xe9':序号不在范围内 (128)"
import io
import pandas as pd
from googleapiclient.errors import HttpError
from apiclient.http import MediaIOBaseUpload # Changed this from MediaFileUpload
df = pd.DataFrame(list(range(10)))
stream = io.StringIO()
# writing df to the stream instead of a file:
df.to_csv(stream, sep=',', encoding='utf-8', index = False)
try:
media = MediaIOBaseUpload(stream,
mimetype='application/octet-stream',
resumable=False)
#### Your upload logic here using media just created ####
except HttpError as error:
#### Handle your errors in uploading here ####
Because I have a unicode character, I developed the alternative code which accomplishes the same thing but can handle the unicode characters.因为我有一个 unicode 字符,所以我开发了替代代码,它完成了同样的事情,但可以处理 unicode 字符。
import io
import pandas as pd
from googleapiclient.errors import HttpError
from apiclient.http import MediaIOBaseUpload # Changed this from MediaFileUpload
df = pd.DataFrame(list(range(10)))
records = df.to_csv(line_terminator='\r\n', index=False).encode('utf-8')
bytes = io.BytesIO(records)
try:
media = MediaIOBaseUpload(bytes,
mimetype='application/octet-stream',
resumable=False)
#### Your upload logic here using media just created ####
except HttpError as error:
#### Handle your errors in uploading here ####
I used:我用了:
from googleapiclient.http import MediaIoBaseUpload
versus @Katherine's:与@Katherine 相比:
from apiclient.http import MediaIOBaseUpload
But other than that, @Katherine's alternative solution worked perfectly for me as I was developing a solution to write a dataframe to a csv file in Google Drive running from a Google Cloud Function.但除此之外,@Katherine 的替代解决方案非常适合我,因为我正在开发一种解决方案,将 dataframe 写入从 Google Cloud Z86408593C34AF727FDD90DF932FZ8B 运行的 Google Drive 中的 csv 文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.