I am working on groupby in Python's pd.DataFrame. The task in the code is that I want to group the data because I want to make sure that no matter how many times I query and output the data to MySQL, it won't mess with my raw data.
df1=pd.DataFrame(df) #this is a DataFrame with multiple different lines of 'Open' for one 'Symbol'
df2=pd.read_sql('select * from 6openposition',con=conn)
df2=df2.append(df1)
df2=df2.groupby(['Symbol']).agg({'Open':'first'})
df2.to_sql(name='6openposition', con=conn, if_exists='replace', index= False, flavor = 'mysql')
#Example Raw Data:
Symbol Open
0 A 10
1 AA 20
2 AA 30
3 AAA 40
4 AAA 50
5 AAA 50
#After I query the data for multiple times(I appended):
Symbol Open
0 A 10
1 AA 20
2 AA 30
3 AAA 40
4 AAA 50
5 AAA 50
0 AA 30
1 AAA 40
2 AAA 50
3 AAA 50
4 AAA 60
#How my code ended up with:
Symbol Open
0 A 10
1 AA 20
2 AAA 40
#What I want:
Symbol Open
0 A 10
1 AA 20
2 AA 30
3 AAA 40
4 AAA 50
5 AAA 50
6 AAA 60
My raw data could have multiple value in column 'Open' for same 'Symbol'. As I eliminate the influence of my multiple times of input to MySQL, raw data here is influenced.
My thought on solving this problem is to group by the initial index and 'Symbol' at the same time because after append the initial indices could be another 'group by' column. The initial indices are [0,1,2,...]. If the 'Symbol' and initial indices are the same, I could take the first value of 'Open' in that group. To group by initial indices I could:
df2=df2.groupby(level=0).agg({'Open':'first'})
#this code will combine the lines with same indices and take the first value of 'Open' column
But I have no idea how to combine 'level=0' to 'level='Symbol''. Could you teach me how to group by two columns including initial indices and another column? Or tell me a way to eliminate multiple times of input not messing with my raw data.
Starting with df
, including your index
which seems to indicate whether data
are repeated:
Symbol Open
0 A 10
1 AA 20
2 AA 30
3 AAA 40
4 AAA 50
5 AAA 50
2 AA 30
3 AAA 40
4 AAA 50
5 AAA 50
Use
df.reset_index().drop_duplicates().drop('index', axis=1)
(keeps first occurrence by default ) to get:
Symbol Open
0 A 10
1 AA 20
2 AA 30
3 AAA 40
4 AAA 50
5 AAA 50
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.