I'm trying to get the value of 'n' in the last row of a dask dataframe.
If I understand correctly, positional indexing isn't an option. I don't know the index of the last row. I thought tail() would be the solution, but it returns and empty dataframe.
print( df.compute() ) # df has 47 rows
returns
file str n
11027 /Users/... XXX... 901
11028 /Users/... XXX... 902
...
11099 /Users/... XXX... 946
11100 /Users/... XXX... 947
then i do
tail = df.tail( n=10, compute=True )
print(tail)
which takes A MINUTE AND FIFTEEN SECONDS which is unacceptably slow since I need to do several thousand of these and returns
Empty DataFrame
Columns: [file, str, n]
Index: []
What am I missing here?
Note, I found a solution for head() returning empty but the solution doesn't apply to tail(). dask dataframe head() returns empty df
print with print (df.tail(10))
Visit https://tutorial.dask.org/04_dataframe.html and find the chapter titled What just happened?. It contains a decription what can go wrong and why.
It contains also a recipe that reading a DataFrame using read_csv you should pass also dtype parameter, specifying column types.
Try this approach.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.