简体   繁体   中英

Pandas index match with column numbers and multiple criteria

I am trying to fill a dataframe that looks like this

      Name   Origin      Date Open High  Low Close    Date+1  Open+1 High+1 Low+1 Close+1
0  Bananas     Bali  20200108  NaN  NaN  NaN   NaN  20200109     NaN    NaN   NaN     NaN
1  Coconut  Bahamas  20200110  NaN  NaN  NaN   NaN  20200111     NaN    NaN   NaN     NaN

With data found in a dataframe that looks like this

      Name   Origin      Date      Time  Open  High  Low  Close
0  Bananas     Bali  20200108  15:30:00  1.58  1.85  1.4   1.50
1  Bananas     Bali  20200108  22:00:00  1.68  1.78  1.5   1.60
2  Bananas     Bali  20200109  15:30:00  1.88  1.95  1.7   1.86
3  Bananas     Bali  20200109  22:00:00  1.78  1.88  1.6   1.65
4  Coconut  Bahamas  20200110  15:30:00  2.58  2.85  2.4   2.50
5  Coconut  Bahamas  20200110  22:00:00  2.68  2.78  2.5   2.60
6  Coconut  Bahamas  20200111  15:30:00  2.88  2.95  2.7   2.86
7  Coconut  Bahamas  20200111  22:00:00  2.78  2.88  2.6   2.65

Since the columns in the first data frame have different names (eg. "Open" & "Open+1"), I can't think of an easy way to index match without having to copy the code and rename the columns in the second dataframe. Therefore I think it's easier to index match by column number, but im having issues with figuring out how to do this. The conditions for the columns are 'Name', 'Origin' and 'Date' (Date+1 for Open+1, etc...).

I tried to use the following code:

ColOpen = df2.iloc[:, [0,1,2,4,5,6,7]].groupby([0,1,2]).agg(Open=(4,'first'),High=(5,'max'),Low=(6,'min'), Close=(7,'last'))

to get the right values for the columns, but I am getting a 'KeyError: 0', which refers to the column numbers.

I have created an example code below that can be used to get the same dataframes.

import pandas as pd

#Creating first sample dataframe
lst1 = [['Bananas', 'Bali', '20200108', 'NaN', 'NaN', 'NaN', 'NaN', '20200109', 'NaN', 'NaN', 'NaN', 'NaN'],
   ['Coconut', 'Bahamas', '20200110', 'NaN', 'NaN', 'NaN', 'NaN', '20200111', 'NaN', 'NaN', 'NaN', 'Nan']]

df1 = pd.DataFrame(lst1, columns =['Name', 'Origin', 'Date', 'Open', 'High', 'Low', 'Close', 'Date+1', 'Open+1', 'High+1', 'Low+1', 'Close+1'])
print('First Dataframe')
print(df1)

#Creating second sample dataframe
lst2 = [['Bananas', 'Bali', '20200108', '15:30:00', 1.58, 1.85, 1.50, 1.50],
    ['Bananas', 'Bali', '20200108', '22:00:00', 1.68, 1.78, 1.40, 1.60],
    ['Bananas', 'Bali', '20200109', '15:30:00', 1.88, 1.95, 1.70, 1.86],
    ['Bananas', 'Bali', '20200109', '22:00:00', 1.78, 1.88, 1.60, 1.65],
    ['Coconut', 'Bahamas', '20200110', '15:30:00', 2.58, 2.85, 2.50, 2.50],
    ['Coconut', 'Bahamas', '20200110', '22:00:00', 2.68, 2.78, 2.40, 2.60],
    ['Coconut', 'Bahamas', '20200111', '15:30:00', 2.88, 2.95, 2.70, 2.86],
    ['Coconut', 'Bahamas', '20200111', '22:00:00', 2.78, 2.88, 2.60, 2.65]]

df2 = pd.DataFrame(lst2, columns =['Name', 'Origin', 'Date', 'Time', 'Open', 'High', 'Low', 'Close'])
print('Second Dataframe')
print(df2)

#Index Match

ColOpen = df2.iloc[:, [0,1,2,4,5,6,7]].groupby([0,1,2]).agg(Open=(4,'first'),High=(5,'max'),Low=(6,'min'), Close=(7,'last'))


print("Printing first index")
print(ColOpen)

#Desired Output
lst3 = [['Bananas', 'Bali', '20200108', 1.58, 1.85, 1.4, 1.6, '20200109', 1.88, 1.95, 1.6, 1.65],
   ['Coconut', 'Bahamas', '20200110', 2.58, 2.85, 2.4, 2.6, '20200111', 2.88, 2.95, 2.6, 2.65]]

df3 = pd.DataFrame(lst3, columns =['Name', 'Origin', 'Date', 'Open', 'High', 'Low', 'Close', 'Date+1', 'Open+1', 'High+1', 'Low+1', 'Close+1'])
print('Desired Output')
print(df3)

Can someone help me to figure out how to do this?

EDIT: Desired output. Also updated code a bit.

      Name   Origin      Date  Open  ...  Open+1  High+1  Low+1 Close+1
0  Bananas     Bali  20200108  1.58  ...    1.88    1.95    1.6    1.65
1  Coconut  Bahamas  20200110  2.58  ...    2.88    2.95    2.6    2.65

Edit: Found an easier solution using groupby.

Basically you pd.concat your data, but the data you concat is shift ed 1 row backwards. Then concat , and do some editing. There you have it! df4 is what you are looking for.

import pandas as pd

df = pd.read_clipboard()

# all your new data is here
df2 = df.groupby(["Date", "Name", "Origin"]).agg(
    {"Open": ["min"], "High": ["max"], "Low": ["min"], "Close": ["max"]}
)

df2 = df2.droplevel(1, axis=1).reset_index()

column_names = ["Name", "Origin", "Date", "Open", "High", "Low", "Close", "Date+1", "Open+1", "High+1", "Low", "Close+1"]
desired_df = pd.DataFrame(columns=column_names)

df3 = pd.concat([df2, df2.add_suffix('+1').shift(-1)], axis=1)

df4 = df3.iloc[::2]

df4 = df4.drop(columns=['Date+1', 'Name+1', 'Origin+1']).reset_index(drop=True)

    Date    Name    Origin  Open    High    Low Close   Open+1  High+1  Low+1   Close+1
0   20200108    Bananas Bali    1.58    1.85    1.4 1.6 1.78    1.95    1.6 1.86
1   20200110    Coconut Bahamas 2.58    2.85    2.4 2.6 2.78    2.95    2.6 2.86

Not the most efficient answer, but the desired outcome is so unusual. Here is the code, I mainly used Python functions with pandas dataframes . Get your data by copying your table with Ctrl+C or add it manually.

import pandas as pd
import numpy as np

df = pd.read_clipboard()
column_names = ["Name", "Origin", "Date", "Open", "High", "Low", "Close", "Date+1", "Open+1", "High+1", "Low", "Close+1"]

def data_getter(data):
    intro = data.iloc[0][0:3]
    open_ = data.iloc[0].Open
    close = data.iloc[1].Close
    high = data.loc[:, 'High'].max()
    low = data.loc[:, 'Low'].min()
    frame = np.append(intro, [open_, high, low, close])
    return frame

def df_formatter(num: int):

    d = []

    for i in range(2):
        data = df.iloc[num*4+(i)*2:num*4+(i+1)*2]
        d.append(data_getter(data))

    d = np.append(d[0], [d[1][2:]])
    d = pd.Series(d)
    d.index = column_names
    return d

desired_df = pd.DataFrame(columns=column_names)

for i in range(int(df.shape[0]/4)):
    desired_df = desired_df.append(df_formatter(i), ignore_index=True)

print(desired_df)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM