Share your thoughts in the 2024 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

0 votes
in Sequences by
edited by

looking at pmap

(defn pmap
  "Like map, except f is applied in parallel. Semi-lazy in that the
  parallel computation stays ahead of the consumption, but doesn't
  realize the entire result unless required. Only useful for
  computationally intensive functions where the time of f dominates
  the coordination overhead."
  {:added "1.0"
   :static true}
  ([f coll]
   (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
         rets (map #(future (f %)) coll)
         step (fn step [[x & xs :as vs] fs]
                (lazy-seq
                 (if-let [s (seq fs)]
                   (cons (deref x) (step xs (rest s)))
                   (map deref vs))))]
     (step rets (drop n rets))))
  ([f coll & colls]
   (let [step (fn step [cs]
                (lazy-seq
                 (let [ss (map seq cs)]
                   (when (every? identity ss)
                     (cons (map first ss) (step (map rest ss)))))))]
     (pmap #(apply f %) (step (cons coll colls))))))

the level of parallelism seams to be hard coded:

(+ 2 (.. Runtime getRuntime availableProcessors))

why is that?

also, is there no other way *out* ?

2 Answers

+2 votes
by
edited by

Java Concurrency In Practice [Goetz, 2006]:

For compute-intensive tasks, an Ncpu-processor system usually achieves
optimum utilization with a thread pool of Ncpu+1 threads. (Even
compute-intensive threads occasionally take a page fault or pause for
some other reason, so an "extra" runnable thread prevents CPU cycles
from going unused when this happens.) For tasks that also include I/O
or other blocking operations, you want a larger pool, since not all of
the threads will be schedulable at all times.

You should use pmap for compute-intensive tasks. For blocking I/O I'd use core.async with thread macro or pipeline-blocking (go or pipeline can be used instead of pmap), agent with send-off or j.u.c. thread pool directly.

by
... ok... so what your saying is... you can not control the level of parallelism, but that's ok, since for example to deal with blocking i/o you have got other options.

and i see your point. but my question is basically this, what is one gaining by hard coding this? or... to put it differently,... what is the downside of having something like *out* ?
by
pmap is a very blunt instrument and, although it can be a useful quick'n'dirty solution occasionally, you generally want to avoid it. The JVM has plenty of options for much more controlled concurrent processing and you can (and should) use then when you need any sort of control over the thread pool etc.
+1 vote
by

As Sean said pmap is a quick-and-dirty mechanism which you should generally avoid.
People often recommend using java Executors framework instead (via interop).
There's also an alternative implementation of pmap (and other concurrency constructs in https://github.com/TheClimateCorporation/claypoole where you can specify your own thread pool.

...