I am trying to read a csv file save in gs to a dataframe for analysis
I have follow the following steps without success
mybucket = storage.Bucket('bucket-name')
data_csv = mybucket.object('data.csv')
df = pd.read_csv(data_csv)
this doesn't work since data_csv is not a path as expected by pd.read_csv I also tried
%%gcs read --object $data_csv --variable data
#result: %gcs: error: unrecognized arguments: Cloud Storage Object gs://path/to/file.csv
How can I read my file for analysis do this?
Thanks
%%gcs returns bytes objects. To read it use BytesIO from io (python 3)
mybucket = storage.Bucket('bucket-name')
data_csv = mybucket.object('data.csv')
%%gcs read --object $data_csv --variable data
df = pd.read_csv(BytesIO(data_csv), sep = ';')
if your csv file is comma separated, no need to specify < sep = ',' > which is the default read more about io library and packages here: Core tools for working with streams
You just need to use the object's uri
property to get the actual path:
uri = data_csv.uri
%%gcs read --object $uri --variable data
The first part of your code doesn't work because pandas expects the data to be in the local file system, but you're using a GCS bucket, which is in Cloud.
这对我有用
df = pd.read_csv(BytesIO(data), encoding='unicode_escape')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.