This weekend, I visited PyCon Italy in the pittoreque town of Firenze. It was a great conference with great talks and encounters (great thanks to all the volunteers who made it happen) and amazing coffee.

I held a talk with the title ``Python Data Science Going Functional'' Science Track", where I mostly presented on Dask, one of those libraries-to-watch in the Python data science eco system. Slides are available on speaker deck.


The talk introduces Dask as a functional abstraction in the Python data science stack. While creating the slides I had stumbled over an exciting tweet about Dask by Travis Oliphant

Cool to see dask.array achieving similar performance to Cython + OpenMP: Much simpler code with #dask. @PyData
— Travis Oliphant (@teoliphant) April 4

And after verifying the results on my machine (with some modifications, as I do not trust timeit), I included a very similar benchmark in my slides. While reproducing and adapting the benchmarks, I stumbled over some weirdly long execution times for the dask from_array classmethod. So I included this finding in my talk’s slides without really being able to attribute this delay to a specific reason.

After delivering my talk I felt a bit unsatisfied about this. Why did from_array perform so badly? So I decided to ask. The answer: Dask hashes down the whole array in from_array to generate a key for it, which is the reason for it to be so slow. The solution is surprisingly simple. By passing a name='identifier' to the from_array, one can provide a custom key and from_array is a suddenly a cheap operation. So the current state of my benchmark shows that Dask improves upon pure numpy or numexpr performance, however does not quite reach the performance of a Cython implementation:

A corrected benchmark showing execution times for numexpr (NX)\{: width=``100%''}

The expression evaluated in that benchmark was

x = da.from_array(x_np, chunks=arr.shape[0] / CPU_COUNT, name='x')
mx = x.max()
x = (x / mx).sum() * mx

I plan to upload a revised edition of the slides on Speakerdeck (once I have a decenet internet connection again), to include the improved benchmark, so that they are not misleading for people who stumble on them without context.


What can we conclude from this?

  • The conversion overhead of converting a dask array to a numpy array is not as bad as I feared.

  • There are two aspects in a benchnark: performance and usability.

  • Dask should be watched not only for out-of-core computations, but also for parallelizing simple, blocking numpy expressions.