简体   繁体   中英

Refresh indices in Pandas Dataframe

I have deleted some rows in a pandas DataFrame, but in the new DataFrame the indices are not refreshed, ie from this:

     id  marks
 1  123 45
 2  124 67
 3  127 89
 4  257 10
 5  345 34

I have obtained:

    id  marks
 2  124 67
 4  257 10
 5  345 34

While I want:

    id  marks
 1  124 67
 2  257 10
 3  345 34

For default index is used reset_index - index starts from 0 to length of index:

df = df.reset_index(drop=True)
print (df)
    id  marks
0  124     67
1  257     10
2  345     34

#if need starts index values from 1
df.index = df.index + 1
print (df)
    id  marks
1  124     67
2  257     10
3  345     34

Another solution is assign values to index:

df.index = range(1, len(df.index) + 1)
print (df)
    id  marks
1  124     67
2  257     10
3  345     34

The fastest is use RangeIndex :

df.index = pd.RangeIndex(1, len(df.index) + 1)
print (df)
    id  marks
1  124     67
2  257     10
3  345     34

Timings are really interesting:

In [19]: %timeit df.reset_index(drop=True)
The slowest run took 7.41 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 83.3 µs per loop

In [20]: %timeit df.set_index(np.arange(1, len(df)+1))
The slowest run took 7.06 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 114 µs per loop

In [21]: %timeit df.index = range(1, len(df.index) + 1)
The slowest run took 13.12 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.5 µs per loop

In [22]: %timeit df.index = np.arange(1, len(df.index) + 1)
The slowest run took 11.54 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 26.9 µs per loop

In [23]: %timeit df.index = pd.RangeIndex(1, len(df.index) + 1)
The slowest run took 14.43 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.07 µs per loop

df = pd.concat([df]*10000)

In [26]: %timeit df.reset_index(drop=True)
The slowest run took 4.71 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 109 µs per loop

In [27]: %timeit df.set_index(np.arange(1, len(df)+1))
The slowest run took 4.71 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 238 µs per loop

In [28]: %timeit df.index = range(1, len(df.index) + 1)
The slowest run took 13.19 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.8 µs per loop

In [29]: %timeit df.index = np.arange(1, len(df.index) + 1)
The slowest run took 11.29 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 62.8 µs per loop

In [30]: %timeit df.index = pd.RangeIndex(1, len(df.index) + 1)
The slowest run took 14.33 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.24 µs per loop
df = df.set_index(np.arange(1, len(df)+1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM