I am trying to fill a dataframe that looks like this
Name Origin Date Open High Low Close Date+1 Open+1 High+1 Low+1 Close+1
0 Bananas Bali 20200108 NaN NaN NaN NaN 20200109 NaN NaN NaN NaN
1 Coconut Bahamas 20200110 NaN NaN NaN NaN 20200111 NaN NaN NaN NaN
With data found in a dataframe that looks like this
Name Origin Date Time Open High Low Close
0 Bananas Bali 20200108 15:30:00 1.58 1.85 1.4 1.50
1 Bananas Bali 20200108 22:00:00 1.68 1.78 1.5 1.60
2 Bananas Bali 20200109 15:30:00 1.88 1.95 1.7 1.86
3 Bananas Bali 20200109 22:00:00 1.78 1.88 1.6 1.65
4 Coconut Bahamas 20200110 15:30:00 2.58 2.85 2.4 2.50
5 Coconut Bahamas 20200110 22:00:00 2.68 2.78 2.5 2.60
6 Coconut Bahamas 20200111 15:30:00 2.88 2.95 2.7 2.86
7 Coconut Bahamas 20200111 22:00:00 2.78 2.88 2.6 2.65
Since the columns in the first data frame have different names (eg. "Open" & "Open+1"), I can't think of an easy way to index match without having to copy the code and rename the columns in the second dataframe. Therefore I think it's easier to index match by column number, but im having issues with figuring out how to do this. The conditions for the columns are 'Name', 'Origin' and 'Date' (Date+1 for Open+1, etc...).
I tried to use the following code:
ColOpen = df2.iloc[:, [0,1,2,4,5,6,7]].groupby([0,1,2]).agg(Open=(4,'first'),High=(5,'max'),Low=(6,'min'), Close=(7,'last'))
to get the right values for the columns, but I am getting a 'KeyError: 0', which refers to the column numbers.
I have created an example code below that can be used to get the same dataframes.
import pandas as pd
#Creating first sample dataframe
lst1 = [['Bananas', 'Bali', '20200108', 'NaN', 'NaN', 'NaN', 'NaN', '20200109', 'NaN', 'NaN', 'NaN', 'NaN'],
['Coconut', 'Bahamas', '20200110', 'NaN', 'NaN', 'NaN', 'NaN', '20200111', 'NaN', 'NaN', 'NaN', 'Nan']]
df1 = pd.DataFrame(lst1, columns =['Name', 'Origin', 'Date', 'Open', 'High', 'Low', 'Close', 'Date+1', 'Open+1', 'High+1', 'Low+1', 'Close+1'])
print('First Dataframe')
print(df1)
#Creating second sample dataframe
lst2 = [['Bananas', 'Bali', '20200108', '15:30:00', 1.58, 1.85, 1.50, 1.50],
['Bananas', 'Bali', '20200108', '22:00:00', 1.68, 1.78, 1.40, 1.60],
['Bananas', 'Bali', '20200109', '15:30:00', 1.88, 1.95, 1.70, 1.86],
['Bananas', 'Bali', '20200109', '22:00:00', 1.78, 1.88, 1.60, 1.65],
['Coconut', 'Bahamas', '20200110', '15:30:00', 2.58, 2.85, 2.50, 2.50],
['Coconut', 'Bahamas', '20200110', '22:00:00', 2.68, 2.78, 2.40, 2.60],
['Coconut', 'Bahamas', '20200111', '15:30:00', 2.88, 2.95, 2.70, 2.86],
['Coconut', 'Bahamas', '20200111', '22:00:00', 2.78, 2.88, 2.60, 2.65]]
df2 = pd.DataFrame(lst2, columns =['Name', 'Origin', 'Date', 'Time', 'Open', 'High', 'Low', 'Close'])
print('Second Dataframe')
print(df2)
#Index Match
ColOpen = df2.iloc[:, [0,1,2,4,5,6,7]].groupby([0,1,2]).agg(Open=(4,'first'),High=(5,'max'),Low=(6,'min'), Close=(7,'last'))
print("Printing first index")
print(ColOpen)
#Desired Output
lst3 = [['Bananas', 'Bali', '20200108', 1.58, 1.85, 1.4, 1.6, '20200109', 1.88, 1.95, 1.6, 1.65],
['Coconut', 'Bahamas', '20200110', 2.58, 2.85, 2.4, 2.6, '20200111', 2.88, 2.95, 2.6, 2.65]]
df3 = pd.DataFrame(lst3, columns =['Name', 'Origin', 'Date', 'Open', 'High', 'Low', 'Close', 'Date+1', 'Open+1', 'High+1', 'Low+1', 'Close+1'])
print('Desired Output')
print(df3)
Can someone help me to figure out how to do this?
EDIT: Desired output. Also updated code a bit.
Name Origin Date Open ... Open+1 High+1 Low+1 Close+1
0 Bananas Bali 20200108 1.58 ... 1.88 1.95 1.6 1.65
1 Coconut Bahamas 20200110 2.58 ... 2.88 2.95 2.6 2.65
Edit: Found an easier solution using groupby.
Basically you pd.concat
your data, but the data you concat
is shift
ed 1 row backwards. Then concat
, and do some editing. There you have it! df4
is what you are looking for.
import pandas as pd
df = pd.read_clipboard()
# all your new data is here
df2 = df.groupby(["Date", "Name", "Origin"]).agg(
{"Open": ["min"], "High": ["max"], "Low": ["min"], "Close": ["max"]}
)
df2 = df2.droplevel(1, axis=1).reset_index()
column_names = ["Name", "Origin", "Date", "Open", "High", "Low", "Close", "Date+1", "Open+1", "High+1", "Low", "Close+1"]
desired_df = pd.DataFrame(columns=column_names)
df3 = pd.concat([df2, df2.add_suffix('+1').shift(-1)], axis=1)
df4 = df3.iloc[::2]
df4 = df4.drop(columns=['Date+1', 'Name+1', 'Origin+1']).reset_index(drop=True)
Date Name Origin Open High Low Close Open+1 High+1 Low+1 Close+1
0 20200108 Bananas Bali 1.58 1.85 1.4 1.6 1.78 1.95 1.6 1.86
1 20200110 Coconut Bahamas 2.58 2.85 2.4 2.6 2.78 2.95 2.6 2.86
Not the most efficient answer, but the desired outcome is so unusual. Here is the code, I mainly used Python
functions with pandas dataframes
. Get your data by copying your table with Ctrl+C
or add it manually.
import pandas as pd
import numpy as np
df = pd.read_clipboard()
column_names = ["Name", "Origin", "Date", "Open", "High", "Low", "Close", "Date+1", "Open+1", "High+1", "Low", "Close+1"]
def data_getter(data):
intro = data.iloc[0][0:3]
open_ = data.iloc[0].Open
close = data.iloc[1].Close
high = data.loc[:, 'High'].max()
low = data.loc[:, 'Low'].min()
frame = np.append(intro, [open_, high, low, close])
return frame
def df_formatter(num: int):
d = []
for i in range(2):
data = df.iloc[num*4+(i)*2:num*4+(i+1)*2]
d.append(data_getter(data))
d = np.append(d[0], [d[1][2:]])
d = pd.Series(d)
d.index = column_names
return d
desired_df = pd.DataFrame(columns=column_names)
for i in range(int(df.shape[0]/4)):
desired_df = desired_df.append(df_formatter(i), ignore_index=True)
print(desired_df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.