通过云函数将数据从 pubsub 写入 bigtable

Question

I am a beginner at cloud big table and have big issues using cloud functions writing data from pub/sub to bigtable.我是 cloud big table 的初学者，在使用云函数将数据从 pub/sub 写入 bigtable 时遇到了大问题。

Cloud functions gets the messages from pubsub, but the issue is in the next step, writing it into bigtable.云函数从 pubsub 获取消息，但问题在下一步，将其写入 bigtable。

The message is created in a python script and sent to pub/sub.该消息在 python 脚本中创建并发送到发布/订阅。

One example for a message:一个消息示例：

b'{"eda":2.015176,"temperature":33.39,"bvp":-0.49,"x_acc":-36.0,"y_acc":-38.0,"z_acc":-128.0,"heart_rate":83.78,"iddevice":15.0,"timestamp":"2019-12-01T20:01:36.927Z"}' b'{"eda":2.015176,"温度":33.39,"bvp":-0.49,"x_acc":-36.0,"y_acc":-38.0,"z_acc":-128.0,"heart_rate":83.78," iddevice":15.0,"timestamp":"2019-12-01T20:01:36.927Z"}'

For writing it into bigtable I created a table:为了将其写入 bigtable，我创建了一个表：

 from google.cloud import bigtable 
 from google.cloud.bigtable import column_family

 client = bigtable.Client(project="projectid", admin=True) 
 instance = client.instance("bigtableinstance")
 table = instance.table("bigtable1")
 print('Creating the {} table.'.format(table)) 
 print('Creating columnfamily cf1 with Max Version GC rule...')
 max_versions_rule = column_family.MaxVersionsGCRule(2)
 column_family_id = 'cf1'
 column_families = {column_family_id: max_versions_rule}
 if not table.exists():
     table.create(column_families=column_families)
     print("Table {} is created.".format(table)) 
 else:
     print("Table {} already exists.".format(table))

This works without problems.这没有问题。

Now I tried to write the message via pub/sub to bigtable with the following python code in cloud functions using the main method:现在，我尝试使用 main 方法在云函数中使用以下 python 代码通过 pub/sub 将消息写入 bigtable：

import json
import base64
import os
from google.cloud import bigtable
from google.cloud.bigtable import column_family, row_filters


project_id = os.environ.get('projetid', 'UNKNOWN')
INSTANCE = 'bigtableinstance'
TABLE = 'bigtable1'

client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(INSTANCE)

colFamily = "cf1"
def writeToBigTable(table, data):
#    Parameters row_key (bytes) – The key for the row being created.
#    Returns A row owned by this table.
        row_key = data[colFamily]['iddevice'].value.encode()
        row = table.row(row_key)
        for colFamily in data.keys():
            for key in data[colFamily].keys():
                row.set_cell(colFamily,
                                        key,
                                        data[colFamily][key])
        table.mutate_rows([row])
        return data

def selectTable():
    stage = os.environ.get('stage', 'dev')
    table_id = TABLE + stage
    table = instance.table(table_id)
    return table


def main(event, context):
    data = base64.b64decode(event['data']).decode('utf-8')
    print("DATA: {}".format(data))
    eda, temperature, bvp, x_acc, y_acc, z_acc, heart_rate, iddevice, timestamp = data.split(',')

    table = selectTable()

    data = {'eda': eda,
         'temperature': temperature,
         'bvp': bvp,
         'x_acc':x_acc,
         'y_acc':y_acc,
         'z_acc':z_acc,
         'heart_rate':heart_rate,
         'iddevice':iddevice,
         'timestamp':timestamp}
    writeToBigTable(table, data)
    print("Data Written: {}".format(data))

I tried different versions but cannot find a solution.我尝试了不同的版本，但找不到解决方案。

Thanks for the help.谢谢您的帮助。

All the best祝一切顺利

Dominik多米尼克

Answer 1

I think this line is wrong:我认为这一行是错误的：

    row_key = data[colFamily]['iddevice'].value.encode()

You're passing in the data object, but it doesn't have a 'cf1' property.您正在传入数据对象，但它没有“cf1”属性。 You also don't have to encode it.您也不必对其进行编码。 Give this a try:试试这个：

    row_key = data['iddevice']

Your for loop will also have the same issue.你的 for 循环也会有同样的问题。 I think this is what you want instead我认为这就是你想要的

    for col in data.keys():
        row.set_cell(colFamily, key, data[key])

Also, I know you're just playing with it, but using a device id as the only value for a rowkey will end up poorly.另外，我知道您只是在玩弄它，但是使用设备 ID 作为行键的唯一值最终会很糟糕。 What is recommended might be to combine the rowkey and the date or one of your other properties (depending on your query,) and use that as your rowkey.推荐的可能是将 rowkey 和日期或您的其他属性之一（取决于您的查询）组合起来，并将其用作您的 rowkey。 There is a document on Cloud Bigtable schema that is helpful, and a codelab using a more realistic sample dataset and walks through how to pick a schema for that example.有一个关于Cloud Bigtable 架构的文档很有帮助，还有一个使用更真实的示例数据集的代码实验室，并演练了如何为该示例选择架构。 It's in Java, but you can still import the data and run your own queries.它使用 Java，但您仍然可以导入数据并运行您自己的查询。

Answer 2

first thanks a lot for the help.首先非常感谢您的帮助。

I tried to fix it with you code recommendation which is , but unfortunately it doesn't work now due to other errors.我试图用你的代码推荐来修复它，但不幸的是，由于其他错误，它现在不起作用。

AttributeError: 'DirectRow' object has no attribute 'append' AttributeError: 'DirectRow' 对象没有属性 'append'

I guess this is within the following line of code我想这是在以下代码行中

        row.set_cell(colFamily,
                     key,
                     data[key])

I could imagine that the errors origin is in the split of the string "data"我可以想象错误的起源是在字符串“数据”的分割中

eda, temperature, bvp, x_acc, y_acc, z_acc, heart_rate, iddevice, timestamp = data.split(',')

Eg eda would look like this:例如 eda 看起来像这样：

"'eda':2.015176"

which looks pretty wrong to me.这对我来说看起来很错误。

Especially when I insert it into the following dict:特别是当我将它插入以下字典时：

 data = {'eda': eda,....}

The error错误

AttributeError: 'DirectRow' object has no attribute 'append' seems to say, that there is a problem with the data I want to process with set_cell. AttributeError: 'DirectRow' object has no attribute 'append' 似乎是说我想用 set_cell 处理的数据有问题。 There is said set_cell with row as a list or any other iterable of Direct Row Instance.有说 set_cell 与行作为列表或直接行实例的任何其他可迭代对象。 Shouldn't fit a dic for it?不应该适合它吗？

I tried a workaround with a list, but this seems to make it even worse.我尝试了一个列表的解决方法，但这似乎让它变得更糟。

client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(INSTANCE)

colFamily = "cf1"
def writeToBigTable(table, dat):

    row_key = "{}-{}".format(dat[16], dat[17])
    row = table.row(row_key)
    for n in range(len(dat)):
        row.set_cell(colFamily,
                     dat[n],
                     dat[n+9])
    table.mutate_rows([row])
    return dat

def selectTable():
    stage = os.environ.get('stage', 'dev')
    table_id = TABLE + stage
    table = instance.table(table_id)
    return table


def main(event, context):
    data = base64.b64decode(event['data']).decode('utf-8')
    print("DATA: {}".format(data))
    var_1, eda, var_2, temperature, var_3, bvp, var_4, x_acc, var_5, y_acc, var_6, z_acc, var_7, heart_rate, var_8, iddevice, var_9, timestamp = data.replace(':',',').split(',')

    table = selectTable(); dat = [var_1, var_2, var_3, var_4, var_5, var_6, var_7, var_8, var_9, eda, temperature, bvp, x_acc, y_acc, z_acc, heart_rate, iddevice, timestamp]; 

#   data = {'eda': eda,
#         'temperature': temperature,
#         'bvp': bvp,
#         'x_acc':x_acc,
#         'y_acc':y_acc,
#         'z_acc':z_acc,
#         'heart_rate':heart_rate,
#         'iddevice':iddevice,
#         'timestamp':timestamp}
    writeToBigTable(table, dat)
    print("Data Written: {}".format(data))

I am really hard stuck at this problem and have no further ideas how to solve it.我真的很难解决这个问题，并且没有进一步的想法如何解决它。

通过云函数将数据从 pubsub 写入 bigtable

问题描述

2 个解决方案

解决方案1
2 2019-12-01 20:41:24

解决方案2
0 2019-12-02 10:26:12

通过云函数将数据从 pubsub 写入 bigtable

问题描述

2 个解决方案

解决方案1 2 2019-12-01 20:41:24

解决方案2 0 2019-12-02 10:26:12

解决方案1
2 2019-12-01 20:41:24

解决方案2
0 2019-12-02 10:26:12