简体   繁体   English

如何将 CSV 数据翻译成 TFRecord 文件

[英]How to Translate CSV Data into TFRecord Files

Currently I am working on a system that can take data from a CSV file and import it into a TFRecord file, However I have a few questions.目前我正在开发一个可以从 CSV 文件中获取数据并将其导入 TFRecord 文件的系统,但是我有几个问题。

For starters, I need to know what type a TFRecord file can take, when using CSV types are removed.对于初学者,我需要知道 TFRecord 文件可以采用什么类型,当使用 CSV 类型被删除时。

Secondly, How can I convert data type:object into a type that a TFRecord can take?其次,如何将数据类型:object 转换为 TFRecord 可以采用的类型?

I have two columns (will post example below) of two objects types that are strings, How can I convert that data to the correct type for TFRecords?我有两个字符串类型的对象类型的两列(将在下面发布示例),如何将该数据转换为 TFRecords 的正确类型?

When importing Im hoping to append data from each row at a time into the TFRecord file, any advice or documentation would be great, I have been looking for some time at this problem and it seems there can only be ints,floats inputted into a TFRecord but what about a list/array of Integers?当我希望一次将每一行中的 append 数据导入 TFRecord 文件时,任何建议或文档都会很棒,我一直在寻找这个问题的时间,似乎只能将整数、浮点数输入到 TFRecord但是整数列表/数组呢?

Thankyou for reading!谢谢你的阅读!

Quick Note, I am using PANDAS to create a dataframe of the CSV file快速说明,我正在使用 PANDAS 创建 CSV 文件的 dataframe

Some Example Code Im using我使用的一些示例代码

import pandas as pd
from ast import literal_eval
import numpy as np
import tensorflow as tf


tf.compat.v1.enable_eager_execution()


def Start():
    db = pd.read_csv("I:\Github\ClubKeno\Keno Project\Database\..\LotteryDatabase.csv")

    pd.DataFrame = db
    print(db['Winning_Numbers'])
    print(db.dtypes)

    training_dataset = (
        tf.data.Dataset.from_tensor_slices(
            (
                tf.cast(db['Draw_Number'].values, tf.int64),
                tf.cast(db['Winning_Numbers'].values, tf.int64),
                tf.cast(db['Extra_Numbers'].values, tf.int64),
                tf.cast(db['Kicker'].values, tf.int64)
            )
        )
    )

    for features_tensor, target_tensor in training_dataset:
        print(f'features:{features_tensor} target:{target_tensor}')

Error Message:错误信息:

错误信息

CSV Data CSV 数据

Update: Got Two Columns of dating working using the following function...更新:使用以下 function 获得两列约会工作...

dataset = tf.data.experimental.make_csv_dataset(
        file_pattern=databasefile,
        column_names=['Draw_Number', 'Kicker'],
        column_defaults=[tf.int64, tf.int64],
    )

However when trying to include my two other column object types (What data looks like in both those columns) "3,9,11,16,25,26,28,29,36,40,41,46,63,66,67,69,72,73,78,80"但是,当尝试包含我的另外两列 object 类型时(这两列中的数据是什么样的) "3,9,11,16,25,26,28,29,36,40,41,46,63,66,67,69,72,73,78,80"

I get an error, here is the function I tried for that我收到一个错误,这是我尝试过的 function

    dataset = tf.data.experimental.make_csv_dataset(
        file_pattern=databasefile,
        column_names=['Draw_Number', 'Winning_Numbers', 'Extra_Numbers', 'Kicker'],
        column_defaults=[tf.int64, tf.compat.as_bytes, tf.compat.as_bytes, tf.int64],
        header=True,
        batch_size=100,
        field_delim=',',
        na_value='NA'
    )

This Error Appears:出现此错误:

TypeError: Failed to convert object of type <class 'function'> to Tensor. Contents: <function as_bytes at 0x000000EA530908C8>. Consider casting elements to a supported type.

Should I try to Cast those two types outside the function and try combining it later into the TFRecord file alongside the tf.data from the make_csv_dataset function?我是否应该尝试将这两种类型投射到 function 之外,然后尝试将其与 make_csv_dataset function 中的make_csv_dataset一起合并到 TFRecord 文件中?

For starters, I need to know what type a TFRecord file can take, when using CSV types are removed.对于初学者,我需要知道 TFRecord 文件可以采用什么类型,当使用 CSV 类型被删除时。

TFRecord accepts following datatypes- string, byte, float32, float 64, bool, enum, int32, int64, uint32, uint64 Talked here . TFRecord 接受以下数据类型 - string、byte、float32、float 64、bool、enum、int32、int64、uint32、uint64 Talked here

Secondly, How can I convert data type:object into a type that a TFRecord can take?其次,如何将数据类型:object 转换为 TFRecord 可以采用的类型?

Here is an example from TF, it is a bit complicated to digest it at once but if you read it carefully it is easy. 是TF的一个例子,一次消化有点复杂,但如果你仔细阅读它很容易。

have two columns (will post example below) of two objects types that are strings, How can I convert that data to the correct type for TFRecords?有两个字符串类型的对象类型的两列(将在下面发布示例),如何将该数据转换为 TFRecords 的正确类型?

For string type data, you require tf.train.BytesList which returns a bytes_list from a string.对于字符串类型数据,您需要从字符串返回tf.train.BytesList的 tf.train.BytesList。

When importing Im hoping to append data from each row at a time into the TFRecord file, any advice or documentation would be great, I have been looking for some time at this problem and it seems there can only be ints,floats inputted into a TFRecord but what about a list/array of Integers?当我希望一次将每一行中的 append 数据导入 TFRecord 文件时,任何建议或文档都会很棒,我一直在寻找这个问题的时间,似乎只能将整数、浮点数输入到 TFRecord但是整数列表/数组呢?

Quick Note, I am using PANDAS to create a dataframe of the CSV file快速说明,我正在使用 PANDAS 创建 CSV 文件的 dataframe

Instead of reading csv file using Pandas, I would recommend you to use tf.data.experimental.make_csv_dataset defined here .我建议您使用此处定义的tf.data.experimental.make_csv_dataset而不是使用 Pandas 读取 csv 文件。 This will make this conversion process very faster than Pandas and will give you less compatibility issues to work with TF classes.这将使这个转换过程比 Pandas 更快,并且会减少使用 TF 类的兼容性问题。 If you use this function, then you will not need to read the csv file row by row but all at once using map() which uses eager execution .如果您使用这个 function,那么您将不需要逐行读取 csv 文件,而是使用使用eager executionmap()一次完成。 This is a good tutorial to get started. 是一个很好的入门教程。

Accidentally edited wrong section of the post不小心编辑了帖子的错误部分

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM