简体   繁体   中英

How to update da Pandas Panel without duplicates

Currently i'm working on a Livetiming-Software for a motorsport-application. Therefore i have to crawl a Livetiming-Webpage and copy the Data to a big Dataframe. This Dataframe is the source of several diagramms i want to make. To keep my Dataframe up to date, i have to crawl the webpage very often.

I can download the Data and save them as a Panda.Dataframe. But my Problem is step from the downloaded DataFrame to the Big Dataframe, that includes all the Data.

import pandas as pd
import numpy as np
df1= pd.DataFrame({'Pos':[1,2,3,4,5,6],'CLS':['V5','V5','V5','V4','V4','V4'],
                 'Nr.':['13','700','30','55','24','985'],
                 'Zeit':['1:30,000','1:45,000','1:50,000','1:25,333','1:13,366','1:17,000'],
                 'Laps':['1','1','1','1','1','1']})

df2= pd.DataFrame({'Pos':[1,2,3,4,5,6],'CLS':['V5','V5','V5','V4','V4','V4'],
                 'Nr.':['13','700','30','55','24','985'],
                 'Zeit':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,],
                 'Laps':['2','2','2','2','2','2']})
df3= pd.DataFrame({'Pos':[1,2,3,4,5,6],'CLS':['V5','V5','V5','V4','V4','V4'],
                 'Nr.':['13','700','30','55','24','985'],
                 'Zeit':['1:31,000','1:41,000','1:51,000','1:21,333','1:11,366','1:11,000'],
                 'Laps':['2','2','2','2','2','2']})
df1.set_index(['CLS','Nr.','Laps'],inplace=True)
df2.set_index(['CLS','Nr.','Laps'],inplace=True)
df3.set_index(['CLS','Nr.','Laps'],inplace=True)

df1 shows a Dataframe from previous laps. df2 shows a Dataframe in the second lap. The Lap is not completed, so i have a nan. df3 shows a Dataframe after the second lap is completed.

My target is to have just one row for each Lap per Car per Class. Either i have the problem, that i have duplicates with incomplete Laps or all date get overwritten.

I hope that someone can help me with this problem.

Thank you so far.

MrCrunsh

If I understand your problem correctly, your issue is that you have overlapping data for the second lap: information while the lap is still in progress and information after it's over. If you want to put all the information for a given lap in one row, I'd suggest use multi-index columns or changing the column names to reflect the difference between measurements during and after laps.

df = pd.concat([df1, df3])
df = pd.concat([df, df2], axis=1, keys=['after', 'during'])

The result will look like this:

             after           during
               Pos      Zeit    Pos Zeit
CLS Nr. Laps
V4  24  1        5  1:13,366    NaN  NaN
        2        5  1:11,366    5.0  NaN
    55  1        4  1:25,333    NaN  NaN
        2        4  1:21,333    4.0  NaN
    985 1        6  1:17,000    NaN  NaN
        2        6  1:11,000    6.0  NaN
V5  13  1        1  1:30,000    NaN  NaN
        2        1  1:31,000    1.0  NaN
    30  1        3  1:50,000    NaN  NaN
        2        3  1:51,000    3.0  NaN
    700 1        2  1:45,000    NaN  NaN
        2        2  1:41,000    2.0  NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM