简体   繁体   English

在不规则张量中剥离和过滤字符串

[英]strip and filter strings in ragged tensor

I would like learn if there is a decent and tensorflownic way to do follow conversion.我想了解是否有一种体面的、张量的方式来进行转换。 Basically each string(row) as several words, and each word has a suffix like "%1", and goal is to strip string and only leave words with suffix value <= certain target value.基本上每个字符串(行)为几个单词,每个单词都有一个像“%1”这样的后缀,目标是剥离字符串,只留下后缀值<=某个目标值的单词。

It is not hard to achieve using regular python programming.使用常规的 python 编程并不难实现。 But I am thinking of adding the step to a tf computational graph, so a more tensorflownic way is preferred.但我正在考虑将步骤添加到 tf 计算图中,因此首选更 tensorflownic 的方式。

#From
text = tf.constant(['a1%0,a2%0,a3%1,a4%2,a5%3,a6%4','a7%3,a8%4',...]) #in shape of (n,) and n is large

#if target = 1, result will be 
res = tf.ragged.constant([["a1", "a2", "a3"], [],...])
#if target = 3, result will be 
res = tf.ragged.constant([["a1", "a2", "a3", "a4", "a5"], ["a7"],...])

You can do the following (tested in tensorflow 2.9 )您可以执行以下操作(在tensorflow 2.9中测试)

text = tf.constant(['a1%0,a2%0,a3%1,a4%2,a5%3,a6%4','a7%3,a8%4',...]) 
target = 1

a = tf.strings.split(tf.strings.regex_replace(text, "%\d", ""), ",")
b = tf.strings.split(tf.strings.regex_replace(text, "[^,]*%", ""), ",")
b = tf.strings.to_number(b)

c = tf.ragged.boolean_mask(a, (b<=target))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM