简体   繁体   中英

dask.dataframe.tail() returns empty dataframe

I'm trying to get the value of 'n' in the last row of a dask dataframe.

If I understand correctly, positional indexing isn't an option. I don't know the index of the last row. I thought tail() would be the solution, but it returns and empty dataframe.

print( df.compute() ) # df has 47 rows

returns

       file            str          n 
11027  /Users/...      XXX...       901  
11028  /Users/...      XXX...       902  
...                                   
11099  /Users/...      XXX...       946
11100  /Users/...      XXX...       947

then i do

tail = df.tail( n=10, compute=True )
print(tail)

which takes A MINUTE AND FIFTEEN SECONDS which is unacceptably slow since I need to do several thousand of these and returns

Empty DataFrame
Columns: [file, str, n]
Index: []

What am I missing here?

Note, I found a solution for head() returning empty but the solution doesn't apply to tail(). dask dataframe head() returns empty df

print with print (df.tail(10))

Visit https://tutorial.dask.org/04_dataframe.html and find the chapter titled What just happened?. It contains a decription what can go wrong and why.

It contains also a recipe that reading a DataFrame using read_csv you should pass also dtype parameter, specifying column types.

Try this approach.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM