I am having the following problem in a python script: I have a simple for loop that iterates thru a list of lists and passes 2 parameters for another function to go fetch some data.
Running debug I see the loop works fine thru all 6 items in the for loop without any issues, but then, for some strange reason, it tries to repeat the first pair of parameters once again.
At that point I get a pandas error: "Can only compare identically-labeled Series objects" (the for loop passes parameters to a function that slices a bigger df, though I don´t think its relevant for this issue.) Important to say that first time the loop runs thru that combination, it works fine.
Anyone has come across anything like this before?
Trying a graphical explanation:
Params = [[a,b],[c,d],[e,f],[g,h],[i,j],[k,l]]
For item in Params:
df' = df.loc[[df['A'] == item]
What I am saying is par [a,b] goes thru twice, throwing the pandas error in its "second" pass.
Adding a more complete code as requested:
data = pd.DataFrame ['Contains a datetime index of dates, a column called 'name' with values such as 'A','S','F' and 100 others and a column called 'value' for each date and name], what the code tries to accomplish is slice it down to a leaner df containing a subset of 'name' and 'value' within a certain date range (start,end) so I can use and manipulate more easily elsewhere in my code.
pairs = [['A', 'S'], ['A', 'F'], ['S', 'A'], ['S', 'F'], ['F', 'A'], ['F', 'S']], pairs contains all permutations of a subset of columns of "data", in this case, 3 columns selected to go thru 'call_pair_data', hence, 6 permutations.
Code itself:
for index, item in enumerate(pairs):
x = item[0]
y = item[1]
df = call_pair_data(x, y, start, end)
def call_pair_data(x, y, start, end):
df_x = data.loc[start : end]
df_x = df_x.loc[df_x['name'] == x]
df_y = data.loc[start : end]
df_y = df_y.loc[df_y['name'] == y]
pair_df = pd.merge(df_x,df_y, on=['Date'], suffixes=['_x','_y'])
return(pair_df)
Using the .isin()
method would be much simpler and efficient:
for pair in pairs:
res_df = df.loc[(df[start:end]) & (df['name'].isin(pair))]
Or
for pair in pairs
res_df = df[start:end].loc[df['name'].isin(pair)]
This method takes a list or tuple as argument, so this would be valid too;
for pair in pairs:
res_df = df[start: end].loc[df['name'].isin([pair[0], pair[1]])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.