简体   繁体   English

读取具有多个输入和输出的 CSV 文件

[英]Read in CSV-file with multiple inputs and outputs

I have the following table and would like to find out the relationship between inputs and outputs in order to make predictions.我有下表,想找出输入和输出之间的关系以进行预测。

Later I want enter input values for heater_power, voltage, heater_efficiency and heater_mass and generate a prediction for the outputs.后来我想输入加热器功率、电压、加热器效率和加热器质量的输入值,并生成输出预测。

In the table you can see that I have 4 input parameters and 3 output parameters.在表中您可以看到我有 4 个输入参数和 3 个输出参数。

table桌子

I have created a code.我已经创建了一个代码。 The values for input and output are manually written to an array.输入和输出的值手动写入数组。

Import进口

import tensorflow as tf
import numpy as np

Set up training data设置训练数据

inputMatrix = np.array([(100,230,0.95,100),
                        (200,245,0.99,121),
                        ( 40,250,0.91,123)],dtype=float)
outputMatrix = np.array([(120, 5,120),
                         (123,24,100),
                         (154, 3,121)],dtype=float)
for i,c in enumerate(inputMatrix):
print("{}Input Matrix={}Output Matrix".format(c,outputMatrix[i]))

Create the Model创建模型

l0 = tf.keras.layers.Dense(units = 4, input_shape = [4])
l1 = tf.keras.layers.Dense(units = 64)
l2 = tf.keras.layers.Dense(units = 128)
l3 = tf.keras.layers.Dense(units = 3)

model = tf.keras.Sequential([l0,l1,l2,l3])

Compile the Model编译模型

model.compile(loss='mean_squared_error', optimizer=tf.keras.optimizers.Adam(0.1))

Train the model训练模型

history = model.fit(inputMatrix,outputMatrix,epochs=500,verbose=False)
print("Finished training the model!")

Display training statistics显示训练统计

import matplotlib.pyplot as plt
plt.xlabel('Epoch Number')
plt.ylabel('Loss Magnitude')
plt.plot(history.history['loss'])

Use the model to predict values使用模型预测值

print(model.predict(np.array([120,260,0.98,110]).reshape(1,4)))

I now want to read the table automatically from a csv file.我现在想从 csv 文件中自动读取表格。 The data should be separated according to input and output and read in.数据应按输入和输出分开并读入。

How do I do this?我该怎么做呢? Does it make sense to work with arrays here or are there better possibilities?在这里使用数组有意义还是有更好的可能性?

I have some doubts that my approach is basically correct.我怀疑我的方法基本上是正确的。 My code seems to be so short.我的代码似乎很短。 Or do I have to choose a different approach for my problem?还是我必须为我的问题选择不同的方法?

You need to load the DataFrame using MultiIndex since you are having both Input, Output, and all the Features in different levels.您需要使用 MultiIndex 加载 DataFrame,因为您同时拥有不同级别的输入、输出和所有功能。

Below is the code to do so, I have used column names like A,B,C etc, change it according to your data.下面是执行此操作的代码,我使用了 A、B、C 等列名,根据您的数据进行更改。

import pandas as pd
import numpy as np 

index = pd.MultiIndex.from_tuples([("Input","B"),("Input","C"),("Input","D"),("Input","E"),("Output","F"),("Output","G"),("Output","H")])
df = pd.read_csv("sample_csv.csv",header=[0,1],index_col=0)
df.columns = index 

df: df:

在此处输入图片说明

df.columns df.columns

MultiIndex([( 'Input', 'B'),
            ( 'Input', 'C'),
            ( 'Input', 'D'),
            ( 'Input', 'E'),
            ('Output', 'F'),
            ('Output', 'G'),
            ('Output', 'H')],
           )
input_data = df[[( 'Input', 'B'),
            ( 'Input', 'C'),
            ( 'Input', 'D'),
            ( 'Input', 'E')]].values

input_data = list(map(lambda x:tuple(x),input_data))

#Input_data

[(100.0, 230.0, 0.95, 100.0),
 (200.0, 245.0, 0.99, 121.0),
 (40.0, 250.0, 0.91, 123.0)]



output_data = df[[('Output', 'F'),
            ('Output', 'G'),
            ('Output', 'H')]].values

output_data = list(map(lambda x:tuple(x),output_data))

#Ouput_data

[(120, 5, 120), (123, 24, 100), (154, 3, 121)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM