简体   繁体   English

如何在Pandas DataFrame中操纵来自Google AutoML的预测响应?

[英]How to manipulate prediction response from Google AutoML in a Pandas DataFrame?

I have successfully trained a Google AutoML Natural Language model to do multi-label categorization of text using custom labels. 我已经成功地训练了Google AutoML自然语言模型,以使用自定义标签对文本进行多标签分类。

I'm also able to use the python function generated by the trained dataset to generate predictions on text contained in a Pandas DataFrame in a Jupyter Notebook. 我还可以使用受过训练的数据集生成的python函数来对Jupyter Notebook中的Pandas DataFrame中包含的文本生成预测。

However I'm not sure how to use the result and especially manipulate it so that it's useful to me. 但是,我不确定如何使用结果 ,尤其是如何操作它,以便对我有用。

Here's what my code looks like currently: 这是我的代码当前的样子:

r = #api call to get text
df = pd.read_csv(StringIO(r.text), usecols=['text_to_predict'])
df['Category_Predicted'] = df.apply(lambda row: get_prediction(row.review, 'xxx', 'xxxx')

The output of df['Category_Predicted'].head() is df ['Category_Predicted']。head()的输出为

0    payload {\n  classification {\n    score: 0.61...
Name: Category_Predicted, dtype: object

And a simple (more readable) print of one prediction returns 并简单地(更易读)打印一个预测返回

payload {
  classification {
    score: 0.6122230887413025
  }
  display_name: "Shopping"
}
payload {
  classification {
    score: 0.608892023563385
  }
  display_name: "Search"
}
payload {
  classification {
    score: 0.38840705156326294
  }
  display_name: "Usability"
}
payload {
  classification {
    score: 0.2736874222755432
  }
  display_name: "Stability"
}
payload {
  classification {
    score: 0.011237740516662598
  }
  display_name: "Profile"
}
#......................(continues on for all categories)

Now, my primary objective would be for df['Category_Predicted'] to be a field where the topmost (most relevant) categories are comma separated in a simple list. 现在,我的主要目标是将df ['Category_Predicted']设为一个字段,其中最顶部(最相关)的类别以逗号分隔在一个简单列表中。 The example above would be 上面的例子是

Shopping, Search, Usability

(depending how far you want you want to keep labels based on the score) (取决于您希望基于分数保留标签的距离)

So I have several some on my hands: 所以我手上有几个:

  • How to access with python this field to get the category and it's related score? 如何使用python访问此字段以获取类别及其相关分数?

  • How to manipulate it to create a single string? 如何操纵它来创建单个字符串?

Thanks! 谢谢!

EDIT 编辑

As requested in comments, below some examples representing 2 records in my dataframe with (non-complete) payload where in the desired result I have filtered results with score > 0.3. 根据评论中的要求,以下一些示例表示我的数据帧中有(不完整)有效载荷的2条记录,其中在期望的结果中,我对得分> 0.3的结果进行了过滤。 Due to the large text fields I had to use a... "custom" solution for presentation instead of ascii tables 由于文本字段很大,我不得不使用...“自定义”解决方案来表示而不是ascii表

ROW 1 - TEXT TO PREDICT 第1行-要预测的文字

Great app so far. 到目前为止很棒的应用程序。 Just a pity that you can not look in the old app what you still had in your shopping or what your favorites were. 遗憾的是,您无法在旧应用中查看购物中仍然拥有的商品或收藏夹。 This fact is simply gone. 这个事实简直消失了。 Plus that you now have to enter everything in the new one !!! 另外,您现在必须输入新的所有内容!

ROW 1 - PREDICTION OUTPUT 第1行-预测输出

payload {
  classification {
    score: 0.6122230887413025
  }
  display_name: "Shopping"
}
payload {
  classification {
    score: 0.608892023563385
  }
  display_name: "Search"
}
payload {
  classification {
    score: 0.38840705156326294
  }
  display_name: "Usability"
}
payload {
  classification {
    score: 0.2736874222755432
  }
  display_name: "Stability"
}

ROW 1 - DESIRED OUTPUT 第1行-所需的输出

Shopping, Search, Usability 购物,搜索,可用性

ROW 2 - TEXT TO PREDICT 第2行-要预测的文字

2nd time you make us the joke of a new app worse than the 1st. 第二次,您使我们对新应用的笑话比第一次差。 How long before raising the level with this one? 达到这个水平需要多长时间? Not intuitive at all, not so clear ... In short not at the level of the previous one 根本不直观,也不是很清楚……总之不是上一个级别

ROW 2 - PREDICTION OUTPUT 第2行-预测输出

payload {
  classification {
    score: 0.9011210203170776
  }
  display_name: "Usability"
}
payload {
  classification {
    score: 0.8007309436798096
  }
  display_name: "Shopping"
}
payload {
  classification {
    score: 0.5114057660102844
  }
  display_name: "Stability"
}
payload {
  classification {
    score: 0.226901113986969
  }
  display_name: "Search"
}

ROW 2 - DESIRED OUTPUT 第2行-所需的输出

Usability, Shopping, Stability 可用性,购物性,稳定性

What I understand from your question is that , you wanted most relevant category as per prediciton score. 从您的问题中我了解到的是,您希望根据偏好得分获得最相关的类别。 I placed your prediction string output in a text file eg out.txt 我将您的预测字符串输出放置在文本文件中,例如out.txt

import pandas as pd
df = pd.read_csv('out.txt',
             header=None,
             delim_whitespace=True,
             names=['data'])
score = df.loc['score:']['data'].values
category = df.loc['display_name:']['data'].values
score_category = zip(score,category)
for category in sorted(list(score_category), key=lambda x:x[0], reverse=True):
    print(category[1], end=", ")

For above shared prediction result and got the results:- 对于以上共享的预测结果并得到结果:

Shopping, Search, Usability, Stability, Profile,

I know it's bad to answer my own question, but I figured if somebody looks for the same problem, they might find a solution. 我知道回答自己的问题很不好,但是我发现如果有人寻找相同的问题,他们可能会找到解决方案。

As google.cloud.automl_v1beta1 defines it, the return value of method get_prediction is an object of type PredictResponse ( https://cloud.google.com/natural-language/automl/docs/reference/rpc/google.cloud.automl.v1beta1#predictresponse ) 按照google.cloud.automl_v1beta1的定义,方法get_prediction的返回值是PredictResponse( https://cloud.google.com/natural-language/automl/docs/reference/rpc/google.cloud.automl类型的对象。 v1beta1#predictresponse

Using the documentation and available structure of such object I found this code does the trick 使用此类对象的文档和可用结构,我发现这段代码可以解决问题

for index, row in df.iterrows():
    pred = get_prediction(row['review'], GCP_PROJ, AUTOML_DS)
    filteredCategories = filter(filterPrediction, pred.payload)
    df.at[index,'Predicted_Categories'] = ",".join([str(categ.display_name) for categ in filteredCategories])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM