简体   繁体   English

在 tensorflow 或 tensorflow extended 中转换日期

[英]Transforming dates in tensorflow or tensorflow extended

I am working with Tensorflow Extended, preprocessing data and among this data are date values (eg values of the form 16-04-2019).我正在使用 Tensorflow Extended 预处理数据,这些数据中有日期值(例如 16-04-2019 形式的值)。 I need to apply some preprocessing to this, like the difference between two dates and extracting the day, month and year from it.我需要对此进行一些预处理,例如两个日期之间的差异以及从中提取日、月和年。

For example, I could need to have the difference in days between 01-04-2019 and 16-04-2019, but this difference could also span days, months or years.例如,我可能需要 01-04-2019 和 16-04-2019 之间的天数差异,但这种差异也可能跨越数天、数月或数年。

Now, just using Python scripts this is easy to do, but I am wondering if it is also possible to do this with Tensorflow?现在,只需使用 Python 脚本就可以轻松做到这一点,但我想知道是否也可以使用 Tensorflow 做到这一点? It's important for my use case to do this within Tensorflow, because the transform needs to be done in the graph format so that I can serve the model with the transformations inside the pipeline.对于我的用例来说,在 Tensorflow 中执行此操作很重要,因为转换需要以图形格式完成,以便我可以使用管道内的转换为模型提供服务。

I am using Tensorflow 1.13.1, Tensorflow Extended and Python 2.7 for this.为此,我正在使用 Tensorflow 1.13.1、Tensorflow Extended 和 Python 2.7。

Posting from similar issue on tft github.在 tft github 上发布类似问题

Here's a way to do it:这是一种方法:

import tensorflow_addons as tfa
import tensorflow as tf
from typing import TYPE_CHECKING

@tf.function(experimental_follow_type_hints=True)
def fn_seconds_since_1970(date_time: tf.string, date_format: str = "%Y-%m-%d %H:%M:%S %Z"):
    seconds_since_1970 = tfa.text.parse_time(date_time, date_format, output_unit='SECOND')
    seconds_since_1970 = tf.cast(seconds_since_1970, dtype=tf.int64)
    return seconds_since_1970

string_date_tensor = tf.constant("2022-04-01 11:12:13 UTC")

seconds_since_1970 = fn_seconds_since_1970(string_date_tensor)

seconds_in_hour, hours_in_day = tf.constant(3600, dtype=tf.int64), tf.constant(24, dtype=tf.int64)
hours_since_1970 = seconds_since_1970 / seconds_in_hour
hours_since_1970 = tf.cast(hours_since_1970, tf.int64)
hour_of_day = hours_since_1970 % hours_in_day
days_since_1970 = seconds_since_1970 / (seconds_in_hour * hours_in_day)                                                                                                                        
days_since_1970 = tf.cast(days_since_1970, tf.int64)                                                                                                                               
day_of_week = (days_since_1970 + 4) % 7 #Jan 1st 1970 was a Thursday, a 4, Sunday is a 0

print(f"On {string_date_tensor.numpy().decode('utf-8')}, {seconds_since_1970} seconds had elapsed since 1970.")

My two cents on the broader underlying issue, here the question is computing time differences, for which we want to do these computations on tensors.我在更广泛的潜在问题上的两分钱,这里的问题是计算时差,我们想在张量上进行这些计算。 Then the question becomes "What are the units of these tensors?"那么问题就变成了“这些张量的单位是什么?” This is a question of granularity.这是一个粒度问题。 "The next question is what are the data types involved?" “下一个问题是涉及到哪些数据类型?” Start with a string likely, end with a numeric.可能以字符串开头,以数字结尾。 Then the next question becomes is there a "native" tensorflow function that can do this?那么下一个问题就变成了是否有一个“原生”tensorflow 函数可以做到这一点? Enter tensorflow addons !输入tensorflow 插件

Just like we are trying to optimize training by doing everything as tensor operations within the graph, similarly we need to optimize "getting to the graph".就像我们试图通过在图表中将所有事情都作为张量操作来优化训练一样,同样我们需要优化“获取图表”。 I have seen the way datetime would work with python functions here, and I would do everything I could do avoid going into python function land as the code becomes so complex and the performance suffers as well.我在这里看到了 datetime 与 python 函数一起工作的方式,我会尽我所能避免进入 python 函数领域,因为代码变得如此复杂并且性能也会受到影响。 It's a lose-lose in my opinion.我认为这是双输。

PS - This op is not yet implemented on windows as per this , maybe because it only returns unix timestamps:) PS - 此操作尚未按照在 Windows 上实现,可能是因为它只返回 unix 时间戳:)

I had a similar problem.我有一个类似的问题。 The issue because of an if-check with in TFX that doesn't take dates types into account.问题是因为 TFX 中的 if-check with 没有考虑日期类型。 As far as I've been able to figure out, there are two options:据我所知,有两种选择:

  1. Preprocess the date column and cast it to an int (eg calling toordinal() on each element) field before reading it into TFX预处理日期列并将其转换为int (例如,在每个元素上调用toordinal() )字段,然后再将其读入 TFX

  2. Edit the TFX function that checks types to account for date-like types and cast them to ordinal on the fly.编辑 TFX 函数,该函数检查类型以说明类似日期的类型并将它们动态转换为序数。

You can navigate to venv/lib/python3.7/site-packages/tfx/components/example_gen/utils.py and look for the function dict_to_example .您可以导航到venv/lib/python3.7/site-packages/tfx/components/example_gen/utils.py并查找函数dict_to_example You can add a datetime check there like so您可以像这样添加日期时间检查

def dict_to_example(instance: Dict[Text, Any]) -> tf.train.Example:
  """Converts dict to tf example."""
  feature = {}
  for key, value in instance.items():
    # TODO(jyzhao): support more types.
    if isinstance(value, datetime.datetime):  # <---- Check here
        value = value.toordinal()
    if value is None:
      feature[key] = tf.train.Feature()
   ...

value will become an int , and the int will be handled and cast to a Tensorflow type later on in the function. value将变为int ,稍后将在函数中处理int并将其转换为 Tensorflow 类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM