dask.dataframe.tail() returns empty dataframe

Question

I'm trying to get the value of 'n' in the last row of a dask dataframe.

If I understand correctly, positional indexing isn't an option. I don't know the index of the last row. I thought tail() would be the solution, but it returns and empty dataframe.

print( df.compute() ) # df has 47 rows

returns

       file            str          n 
11027  /Users/...      XXX...       901  
11028  /Users/...      XXX...       902  
...                                   
11099  /Users/...      XXX...       946
11100  /Users/...      XXX...       947

then i do

tail = df.tail( n=10, compute=True )
print(tail)

which takes A MINUTE AND FIFTEEN SECONDS which is unacceptably slow since I need to do several thousand of these and returns

Empty DataFrame
Columns: [file, str, n]
Index: []

What am I missing here?

Note, I found a solution for head() returning empty but the solution doesn't apply to tail(). dask dataframe head() returns empty df

Answer 1

print with print (df.tail(10))

Answer 2

Visit https://tutorial.dask.org/04_dataframe.html and find the chapter titled What just happened?. It contains a decription what can go wrong and why.

It contains also a recipe that reading a DataFrame using read_csv you should pass also dtype parameter, specifying column types.

Try this approach.

dask.dataframe.tail() returns empty dataframe

Question

2 answers

solution1
0 2020-04-14 04:40:17

solution2
0 2020-04-14 05:33:17

dask.dataframe.tail() returns empty dataframe

Question

2 answers

solution1 0 2020-04-14 04:40:17

solution2 0 2020-04-14 05:33:17

solution1
0 2020-04-14 04:40:17

solution2
0 2020-04-14 05:33:17