Comment made by: hiredman
I suspect this is just a lack of clarity around what parallelism means
in the context of pipeline-async.
A parallelism of 'n' for pipeline-async means there are 'n' logical
threads spinning off async tasks, and because those tasks are async,
those logical threads spin off the task, and proceed to the next. The
other thing that 'n' effects is the size of the buffer for the
results
channel internally in the pipeline. because the tasks are
asynchronous, the only throttle is effectively the size of the buffer
on the output channel.
If the only throttle is the buffer, which is of size 'n', why do we
seem to have n+2 logical threads running at once, well because there
are a few extra channels involved which combined with the logic for
copying from channel to channel, effectively adds another buffer of
size 2.
This behavior does seem arcane. But I would argue that the
behavior of pipeline-async is so different from pipeline and
pipeline-blocking that it really should be something distinct from
them. The other pipelines take transducers, and using the parallelism
'n' to spin up a number of logical threads for the other pipelines
actually does throttle the number of tasks run in parallel.
If you have not spent time evaluating which pipeline
variant you
needed to use, and just grabbed pipeline-async
because that is the
one that seemed intuitively the right choice (because it has -async
and this is core.async
) you almost certainly should be using the
vanilla pipeline
variant, its behavior is much more intuitive it is
much more likely to be what you want.