I have an excel file, importing as a dataframe. I want to use python to find matches found in the same row of the dataframe that are no more than 0.0002 difference. The rules are:
I am trying to find the right code so I can get the value in row 1 column 1 and then compare it to the other values. Will this work?
df.iat[RowNum, ColNum]
When I find a match, I created four holding columns for each type of match (2, 3, or 4, which means 12 columns). Because each row will have a varying number of matches (or no matches), but for future analysis purposes, I need these to be located in defined column locations to reference. That is why I was planning on have four for each type of match.
For this piece of code, since I know the column name, I was looking to use column name and then use the row number integer to find the right location to enter the value. Is this correct (I concatenate the column name because there are four holding columns for each match end in 1, 2, 3, 4. This is in case there is more than one match found on a row, then I have multiple columns to hold the matches)?
df[ColumnName + str(3)].iloc[RowNum]
I tried to figure out how to get a single 'cell' by using integers (like Cells() in excel, but not sure if right way todo it. The documentation on .loc and .iloc talks about gathering rows of data, not a sincel 'cell'.
Here is a sample of the dataframe (Due to width space, I only showed the first column of each match type (excel TwoMatch2, because that row had 2 times it matched different numbers, but there is four in total for each).
High Low Open Close TwoMatch1 TwoMatch2...ThrMatch1...ForMatch1
0 1.11165 1.11128 1.11137 1.11165 1.1117
1 1.11165 1.11139 1.11148 1.11165
2 1.11167 1.11138 1.11166 1.11138 1.1117 1.1114
3 1.11165 1.11144 1.11165 1.11163 1.1117
4 1.11165 1.11149 1.1115 1.11165
5 1.11165 1.1115 1.11163 1.11163 1.1116 1.1116
6 1.11165 1.11159 1.11159 1.11159 1.1116 1.1116 1.1116
When the code finishes, it write the dataframe back to Excel, csv or database (working on replacing excel and using a database instead). It will have the original data plus the new columns which contain the matches for each row.
Here is the code I have developed, to which I need the above formulas to finalize (in case it helps to know my intentions):
df.reindex(columns = df.columns.tolist() + ['TwoRBs1','TwoRBs2','TwoRBs3','TwoRBs4','ThrRBs1','ThrRBs2','ThrRBs3','ThrRBs4','ForRBs1','ForRBs2','ForRBs3','ForRBs4'])
RowNum = 0
ttlcount = 5
OneMinGroupFlag = 0
FiveMinGroupFlag = 0
FifteenMinGroupFlag = 0
SixtyMinGroupFlag = 0
TwoFortyMinGroupFlag = 0
ColValues = 0
#------------------------------------------------------------------------------------------------------------------------------------------------------------#
#----------------------------------------------------------------Functions-----------------------------------------------------------------------------------#
#------------------------------------------------------------------------------------------------------------------------------------------------------------#
def AssignMinGroup(ColmnNum):
""" If the column or a match was found in the group, then it sets the flag to not check that group again """
nonlocal OneMinGroupFlag
nonlocal FiveMinGroupFlag
nonlocal FifteenMinGroupFlag
nonlocal SixtyMinGroupFlag
nonlocal TwoFortyMinGroupFlag
if (ColmnNum >= 14 and ColmnNum <= 19) or (ColmnNum >= 44 and ColmnNum <= 59): OneMinGroupFlag = 1
elif (ColmnNum >= 20 and ColmnNum <= 25) or (ColmnNum >= 60 and ColmnNum <= 75): FiveMinGroupFlag = 1
elif (ColmnNum >= 26 and ColmnNum <= 31) or (ColmnNum >= 76 and ColmnNum <= 91): FifteenMinGroupFlag = 1
elif (ColmnNum >= 32 and ColmnNum <= 37) or (ColmnNum >= 92 and ColmnNum <= 107): SixtyMinGroupFlag = 1
elif (ColmnNum >= 38 and ColmnNum <= 43) or (ColmnNum >= 108 and ColmnNum <= 123): TwoFortyMinGroupFlag = 1
def FilterGroups(ColmnNum):
nonlocal OneMinGroupFlag
nonlocal FiveMinGroupFlag
nonlocal FifteenMinGroupFlag
nonlocal SixtyMinGroupFlag
nonlocal TwoFortyMinGroupFlag
""""Determines if it is about to test a group that is to be filtered, then sets flag to filter this and go to the next colum/step number"""
if ColmnNum == 44 or ColmnNum == 45 or ColmnNum == 60 or ColmnNum == 61 or ColmnNum == 76 or ColmnNum == 77 or ColmnNum == 92 or ColmnNum == 93 or ColmnNum == 108 or ColmnNum == 109: return(True)
if OneMinGroupFlag == 1 and ((ColmnNum >= 14 and ColmnNum <= 19) or (ColmnNum >= 44 and ColmnNum <= 59)): return(True)
elif FiveMinGroupFlag == 1 and ((ColmnNum >= 20 and ColmnNum <= 25) or (ColmnNum >= 60 and ColmnNum <= 75)): return(True)
elif FifteenMinGroupFlag == 1 and ((ColmnNum >= 26 and ColmnNum <= 31) or (ColmnNum >= 76 and ColmnNum <= 91)): return(True)
elif SixtyMinGroupFlag == 1 and ((ColmnNum >= 32 and ColmnNum <= 37) or (ColmnNum >= 92 and ColmnNum <= 107)): return(True)
elif TwoFortyMinGroupFlag == 1 and ((ColmnNum >= 38 and ColmnNum <= 43) or (ColmnNum >= 108 and ColmnNum <= 123)): return(True)
else: return(False)
def CheckLogMatch(ColumnName,MatchValue):
nonlocal RowNum
""""Will check if the match has already been found, if not, then it will log it into the next available column for match type."""
if abs(df.loc[RowNum, [ColumnName + str(1)]] - MatchValue) <= 0.00029:
if abs(df.loc[RowNum, [ColumnName + str(2)]] - MatchValue) <= 0.00029:
if abs(df.loc[RowNum, [ColumnName + str(3)]] - MatchValue) <= 0.00029:
if abs(df.loc[RowNum, [ColumnName + str(4)]] - MatchValue) <= 0.00029:
pass
else: df.loc[RowNum,[ColumnName + str(4)]] = MatchValue
else: df.loc[RowNum, [ColumnName + str(3)]] = MatchValue
else: df.loc[RowNum,[ColumnName + str(2)]] = MatchValue
else: df.loc[RowNum, [ColumnName + str(1)]] = MatchValue
def Find234Matches():
""""Checks subsequent columns and compares to ColNum to find if there are 2, 3, or 4 matches to ColNum. Then it enters the matches in the table"""
nonlocal ColNum
nonlocal RowNum
nonlocal ColValues
TwoStep = ColNum + 1
while TwoStep <= 123:
if FilterGroups(TwoStep):
TwoStep += 1
continue
else:
Step2Val = df.iat[RowNum, TwoStep]
if abs(ColValues - Step2Val) <= 0.00029:
occur2 = round(median([ColValues, Step2Val]), 4)
AssignMinGroup(TwoStep)
ThreeStep = TwoStep + 1
while ThreeStep <= 123:
if FilterGroups(ThreeStep):
if ThreeStep == 123:
CheckLogMatch('TwoRBs',occur2)
return
else:
ThreeStep += 1
continue
else:
Step3Val = df.iat[RowNum, ThreeStep]
if abs(ColValues - Step3Val) <= 0.00029:
occur3 = round(median([ColValues, Step2Val, Step3Val]), 4)
AssignMinGroup(ThreeStep)
FourStep = ThreeStep + 1
while FourStep <= 123:
if FilterGroups(FourStep):
if FourStep == 123:
CheckLogMatch('ThrRBs',occur3)
CheckLogMatch('TwoRBs',occur2)
return
else:
FourStep += 1
continue
else:
Step4Val = df.iat[RowNum, FourStep]
if abs(ColValues - Step4Val) <= 0.00029:
occur4 = round(median([ColValues, Step2Val, Step3Val, Step4Val]), 4)
CheckLogMatch('ForRBs',occur4)
CheckLogMatch('ThrRBs',occur3)
CheckLogMatch('TwoRBs',occur2)
return
else:
if FourStep == 123:
CheckLogMatch('ThrRBs',occur3)
CheckLogMatch('TwoRBs',occur2)
return
else: FourStep += 1
else:
if ThreeStep == 123:
CheckLogMatch('TwoRBs',occur2)
return
else: ThreeStep += 1
else: TwoStep += 1
#------------------------------------------------------------------------------------------------------------------------------------------------------------#
#------------------------------------------------------------------------------------------------------------------------------------------------------------#
#------------------------------------------------------------------------------------------------------------------------------------------------------------#
while RowNum <= ttlcount:
ColNum = 14
while ColNum <= 107:
ColValues = df.iat[RowNum, ColNum]
if pd.isnull(ColValues) or ColValues > df.iat[RowNum, 9] or ColValues < df.iat[RowNum, 10]:
ColNum += 1
continue
else:
if ColNum == 44 or ColNum == 45 or ColNum == 60 or ColNum == 61 or ColNum == 76 or ColNum == 77 or ColNum == 92 or ColNum == 93 or ColNum == 108 or ColNum == 109:
ColNum += 1
continue
else:
AssignMinGroup(ColNum)
Find234Matches()
ColNum += 1
RowNum += 1
[ answer in progress - working with OP to understand expected output ]
Have look at the output below and determine if this meets your requirements.
import numpy as np
import pandas as pd
cols = [('High', i) for i in df.columns[1:]]
for a, b in cols:
df[a+b] = np.where((df[a] - df[b]).abs() <= 0.0002, (df[a] - df[b]).abs(), None)
High Low Open Close HighLow HighOpen HighClose
0 1.11165 1.11128 1.11137 1.11165 None None 0
1 1.11165 1.11139 1.11148 1.11165 None 0.00017 0
2 1.11167 1.11138 1.11166 1.11138 None 1e-05 None
3 1.11165 1.11144 1.11165 1.11163 None 0 2e-05
4 1.11165 1.11149 1.11150 1.11165 0.00016 0.00015 0
5 1.11165 1.11150 1.11163 1.11163 0.00015 2e-05 2e-05
6 1.11165 1.11159 1.11159 1.11159 6e-05 6e-05 6e-05
I figured it out. The first formula is: dataframe.iat[RowNumber, ColNumber]
The second one is: dataframe['ColumnName'].values[RowNum]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.