繁体   English   中英

Google Translate API - 读取和写入云存储 - Python

[英]Google Translate API - Reading and Writing to Cloud Storage - Python

我正在使用 Google Translation API 来翻译具有多列和多行的 csv 文件。 目标语言是英语,文件有多种语言的文本。

下面发布的代码使用本地文件进行测试,但我想使用(导入)云存储桶中的文件并将翻译后的文件导出到不同的云存储桶。

我试图用我的示例文件运行下面的脚本并收到一条错误消息:“FileNotFoundError: [Errno 2] No such file or directory”

我偶然发现了“读取和写入云存储”的链接,但我无法将建议的解决方案实施到下面的脚本中。 https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/read-write-to-cloud-storage#reading_from_cloud_storage

我可以要求修改脚本以从谷歌云存储桶导入(和翻译)文件并将翻译后的文件导出到不同的谷歌云存储桶吗? 谢谢!

脚本提到:

from google.cloud import translate
import csv


def listToString(s):
    """ Transform list to string"""
    str1 = " "
    return (str1.join(s))

def detect_language(project_id,content):
    """Detecting the language of a text string."""

    client = translate.TranslationServiceClient()
    location = "global"
    parent = f"projects/{project_id}/locations/{location}"

    response = client.detect_language(
        content=content,
        parent=parent,
        mime_type="text/plain",  # mime types: text/plain, text/html
    )

    for language in response.languages:
        return language.language_code


def translate_text(text, project_id,source_lang):
    """Translating Text."""

    client = translate.TranslationServiceClient()
    location = "global"
    parent = f"projects/{project_id}/locations/{location}"

    # Detail on supported types can be found here:
    # https://cloud.google.com/translate/docs/supported-formats
    response = client.translate_text(
        request={
            "parent": parent,
            "contents": [text],
            "mime_type": "text/plain",  # mime types: text/plain, text/html
            "source_language_code": source_lang,
            "target_language_code": "en-US",
        }
    )

    # Display the translation for each input text provided
    for translation in response.translations:
        print("Translated text: {}".format(translation.translated_text))
        
def main():

    project_id="your-project-id"
    csv_files = ["sample1.csv","sample2.csv"]
    # Perform your content extraction here if you have a different file format #
    for csv_file in csv_files:
        csv_file = open(csv_file)
        read_csv = csv.reader(csv_file)
        content_csv = []

        for row in read_csv:
            content_csv.extend(row)
        content = listToString(content_csv) # convert list to string
        detect = detect_language(project_id=project_id,content=content)
        translate_text(text=content,project_id=project_id,source_lang=detect)

if __name__ == "__main__":
    main()

您可以从 GCS 下载文件并针对本地(下载的文件)运行您的逻辑,然后上传到另一个 GCS 存储桶。 例子:

从“my-bucket”下载文件到/tmp

from google.cloud import storage

client = storage.Client()

bucket = client.get_bucket("my-bucket")
source_blob = bucket.blob("blob/path/file.csv")
new_file = "/tmp/file.csv"
download_blob = source_blob.download_to_filename(new_file)

翻译/运行您的代码逻辑后,上传到存储桶:

bucket = client.get_bucket('my-other-bucket')
blob = bucket.blob('myfile.csv')
blob.upload_from_filename('myfile.csv')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM