So, I have a csv file containing data like this:
id type sum_cost date_time
--------------------------------------------------
a1 pound 500 2019-04-21T10:50:06
b1 euro 100 2019-04-21T10:40:00
c1 pound 650 2019-04-21T11:00:00
d1 usd 410 2019-04-21T00:30:00
What I want to do is to insert these data into a database table where the schema is not the same as the csv such that the column in table have like this:
_id , start_time, end_time, pound_cost, euro_cost, count
where I insert from csv to this table such that, id = id
, start_time
is date_time - 1 hour
, end_time
is date_time - 30 minutes
. For pound_cost
and euro_cost
, if type
is pound insert the value from its sum_cost
into pound_cost
and add 0 to euro_cost
. The same way to euro. and add 1 to the count
column.
So, the result of the table will be like this:
_id start_time end_time pound_cost euro_cost count
-----------------------------------------------------------------------------
a1 2019-04-21T09:50:06 2019-04-21T10:20:06 500 0 1
b1 2019-04-21T09:40:06 2019-04-21T10:10:00 0 100 1
c1 2019-04-21T10:00:00 2019-04-21T10:30:00 650 0 1
d1 2019-04-20T23:30:00 2019-04-21T00:00:00 0 410 1
So, how should I insert data to table respect to how I transform values from csv to the table. This is my first time using postgresql and I did not use sql that much so I wonder if there is a function that can do this. Or if not, how can I use Python to transform data and insert them to the table.
Thank you.
As discussed over comments, you may easily accomplish this by using COPY
command and a temporary table to hold your data from the file.
Create a temporary table with the structure of your CSV,note that all are of text datatypes. This makes the copying faster as the validations are minimised.
CREATE TEMP TABLE temptable
( id TEXT ,
TYPE TEXT,
sum_cost TEXT ,
date_time TEXT );
Use COPY
to load from the file into this table. If you are loading the file from a server, use COPY
, If it's in a client machine use psql's \\COPY
. Change it to a different delimiter appropriately if needed.
\COPY temptable from '/somepath/mydata.csv' with delimiter ',' CSV HEADER;
Now, simply run an INSERT INTO .. SELECT
using expressions for various transformations.
INSERT INTO maintable (
_id,start_time,end_time,pound_cost,euro_cost,count )
SELECT id,
date_time::timestamp - INTERVAL '1 HOUR',
date_time::timestamp - INTERVAL '30 MINUTES',
CASE type
WHEN 'pound' THEN sum_cost::numeric
ELSE 0 END,
CASE type when 'euro' THEN sum_cost::numeric --you have not specified what
--happens to USD,use it as required.
ELSE 0 END,
1 as count -- I have hardcoded it based on your info, not sure what it
--actually means
from temptable t;
Now, the data is in your main table
select * from maintable
;
_id | start_time | end_time | pound_cost | euro_cost | count
-----+---------------------+---------------------+------------+-----------+-------
a1 | 2019-04-21 09:50:06 | 2019-04-21 10:20:06 | 500 | 0 | 1
b1 | 2019-04-21 09:40:00 | 2019-04-21 10:10:00 | 0 | 100 | 1
c1 | 2019-04-21 10:00:00 | 2019-04-21 10:30:00 | 650 | 0 | 1
d1 | 2019-04-20 23:30:00 | 2019-04-21 00:00:00 | 0 | 0 | 1
Here's how you might be able to reshape data for your specification:
import os
import pandas as pd
import datetime as dt
dir = r'C:\..\..'
csv_name = 'my_raw_data.csv'
full_path = os.path.join(dir, csv_name)
data = pd.read_csv(full_path)
data = pd.read_csv(full_path)
def process_df(dataframe=data):
df1 = dataframe.copy(deep=True)
df1['date_time'] = pd.to_datetime(df1['date_time'])
df1['count'] = 1
### Maybe get unique types to list for future needs
_types = df1['type'].unique().tolist()
### Process time-series shifts
df1['start_time'] = df1['date_time'] - dt.timedelta(hours=1, minutes=0)
df1['end_time'] = df1['date_time'] - dt.timedelta(hours=0, minutes=50)
## Create conditional masks for the dataframe
pound_type = df1['type'] == 'pound'
euro_type = df1['type'] == 'euro'
### Subsection each dataframe by currency; concatenate results
df_p = df1[df1['type'] == 'pound']
df_e = df1[df1['type'] == 'euro']
df = pd.concat([df_p, df_e]).reset_index(drop=True)
### add conditional columns
df['pound_cost'] = [x if x == 'pound' else 0 for x in df['type']]
df['euro_cost'] = [x if x == 'euro' else 0 for x in df['type']]
### Manually input desired field arrangement
fin_cols = [
'id',
'start_time',
'end_time',
'pound_cost',
'euro_cost',
'count',
]
### Return formatted dataframe
return df.reindex(columns=fin_cols).copy(deep=True)
data1 = process_df()
Output:
id start_time end_time pound_cost euro_cost count
0 a1 2019-04-21 09:50:06 2019-04-21 10:00:06 pound 0 1
1 c1 2019-04-21 10:00:00 2019-04-21 10:10:00 pound 0 1
2 b1 2019-04-21 09:40:00 2019-04-21 09:50:00 0 euro 1
To load to the main SQL table, you'd have to get a connection with SQLAlchemy or pyodbc. Then, assuming all data types match, you should be able to utilize pandas.DataFrame.append() to add data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.