在2024 Clojure 状态调查中分享您的想法！

Question

在数值基准测试中对奇数性能惩罚，通过转换为双精度浮点数得到了缓解...

提问 2023年1月10日在 Java交互由 Tom

我在这里偶然遇到了性能基准测试在此处，我很想知道为什么 Clojure 的性能比 Java 差。

所以我将它投入了分析器（在修改他们的版本以使用未检查的数学后 - 这并没有帮助）并什么都没有显示。嗯。反汇编后找到

// Decompiling class: leibniz$calc_pi_leibniz
import clojure.lang.*;

public final class leibniz$calc_pi_leibniz extends AFunction implements LD
{
    public static double invokeStatic(final long rounds) {
        final long end = 2L + rounds;
        long i = 2L;
        double x = 1.0;
        double pi = 1.0;
        while (i != end) {
            final double x2 = -x;
            final long n = i + 1L;
            final double n2 = x2;
            pi += Numbers.divide(x2, 2L * i - 1L);
            x = n2;
            i = n;
        }
        return Numbers.unchecked_multiply(4L, pi);
    }

    @Override
    public Object invoke(final Object o) {
        return invokeStatic(RT.uncheckedLongCast(o));
    }

    @Override
    public final double invokePrim(final long rounds) {
        return invokeStatic(rounds);
    }
}

看起来是 double/long 边界至少花费了我们一个方法查找，可能在 Numbers.divide 中？
所以我只是将所有东西都强制转换为 double（甚至我们的索引变量）

(def rounds 100000000)

(defn calc-pi-leibniz2
  "Eliminate mixing of long/double to avoid clojure.numbers invocations."
  ^double
  [^long rounds]
  (let [end (+ 2.0 rounds)]
    (loop [i 2.0 x 1.0 pi 1.0]
      (if (= i end)
        (* 4.0 pi)
        (let [x (- x)]
          (recur (inc i) x (+ pi (/ x (dec (* 2 i))))))))))

leibniz=> (c/quick-bench (calc-pi-leibniz rounds))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 575.352216 ms
    Execution time std-deviation : 10.070268 ms
   Execution time lower quantile : 566.210399 ms ( 2.5%)
   Execution time upper quantile : 588.772187 ms (97.5%)
                   Overhead used : 1.884700 ns
nil
leibniz=> (c/quick-bench (calc-pi-leibniz2 rounds))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 158.509049 ms
    Execution time std-deviation : 759.113165 ╡s
   Execution time lower quantile : 157.234899 ms ( 2.5%)
   Execution time upper quantile : 159.205374 ms (97.5%)
                   Overhead used : 1.884700 ns
nil

有什么想法为什么Java 实现在除法上没有相同的惩罚？[两个版本都使用 :warn-on-boxed 实现 unchecked-math]。

我还尝试了一个变体，使用 fastmath 的原始数学运算符，实际上是变慢了。到目前为止，没有东西能打败强制将循环索引 i 转换为 double（这通常是我不会做的）。

评论 2023年1月10日由 Ben Sless

在我的基准测试中，这给出了与您的双精度解决方案相同但不需要强制将索引转换为 double 的性能。

(defn calc-pi-leibniz3
  "消除长整型/双精度浮点数的混合，以避免 clojure.numbers 调用。"
  ^double
  [^long rounds]
  (let [end (+ 2 rounds)]
    (loop [i 2 x 1.0 pi 1.0]
      (if (= i end)
        (* 4.0 pi)
        (let [x (- x)]
          (recur (inc i) x (+ pi (/ x (double (unchecked-dec-int (unchecked-multiply-int (unchecked-int 2) (unchecked-int i))))))))))))

这是关于了解每个编译器插入的类型转换指令的位置。

以下是 Java 解决方案的汇编字节码。

  public static double go(int);
    描述符: (I)D
    标志: (0x0009) ACC_PUBLIC, ACC_STATIC
    代码
      栈=6, 局部变量=6, 参数大小=1
         0: dconst_1
         1: dstore_1
         2: dconst_1
         3: dstore_3
         4: iconst_2
         5: istore        5
         7: iload         5
        9: iload_0
        10: iconst_2
        11: iadd
        12: if_icmpge     39
        15: dload_3
        16: ldc2_w        #24                 // 双精度 -1.0d
        19: dmul
        20: dstore_3
        21: dload_1
        22: dload_3
        23: iconst_2
        24: iload         5
        26: imul
        27: iconst_1
        28: isub
        29: i2d
        30: ddiv
        31: dadd
        32: dstore_1
        33: iinc          5, 1
        36: goto          7
        39: dload_1
        40: ldc2_w        #26                 // 双精度 4.0d
        43: dmul
        44: dup2
        45: dstore_1
        46: dreturn

以下是您的解决方案

    public static double invokeStatic(long rounds);
        标志: PUBLIC, STATIC
        代码
               0: ldc2_w          2.0
               3: lload_0         /* rounds */
               4: invokestatic    clojure/lang/Numbers.add:(DJ)D
               7: dstore_2        /* end */
               8: ldc2_w          2.0
              11: dstore          i
              13: dconst_1
              14: dstore          x
              16: dconst_1
              17: dstore          pi
              19: dload           i
              21: dload_2         /* end */
              22: dcmpl
              23: ifne            36
              26: ldc2_w          4.0
              29: dload           pi
              31: dmul
              32: goto            72
              35: athrow
              36: dload            x
              38: dneg
              39: dstore           x
              41: dload           i
              43: dconst_1
              44: dadd
              45: dload           x
              47: dload           pi
              49: dload           x
              51: ldc2_w           2
              54: dload           i
              56: invokestatic    clojure/lang/Numbers.multiply:(JD)D
              59: dconst_1
              60: dsub
              61: ddiv
              62: dadd
              63: dstore          pi
              65: dstore           x
              67: dstore           i
              69: goto             19
              72: dreturn

我的解决方案

    public static double invokeStatic(long rounds);
        标志: PUBLIC, STATIC
        代码
              0: ldc2_w           2
               3: lload_0         /* rounds */
              4: ladd
              5: lstore_2         /* 结束 */
              6: ldc2_w           2
              9: lstore           i
              11: dconst_1
              12: dstore           x
              14: dconst_1
              15: dstore           pi
              17: lload           i
              19: lload_2         /* 结束 */
              20: lcmp
              21: ifne           34
              24: ldc2_w          4.0
              27: dload           pi
              29: dmul
              30: goto           71
              33: athrow
              34: dload           x
              36: dneg
              37: dstore           x
              39: lload          i
              41: lconst_1
              42: ladd
              43: dload           x
              45: dload           pi
              47: dload           x
              49: ldc2_w          2
              53: lload           i
              55: l2i
              56: imul
              57: iconst_1
              58: isub
              59: i2d
              60: ddiv
              61: dadd
              62: dstore          pi
              64: dstore          x
              66: lstore          i
              68: goto            17
              71: dreturn

也避免了一切方法调用，并且直接与原生类型协同工作

尽管如此，所有这些手舞足蹈并不能比你的解决方案表现得更好

根据我的测试，两种解决方案都能达到与Java相同的表现性能

评论 Jan 10, 2023 由 Tom

评论 Jan 12, 2023 由 Chris Nuernberger

1 个回答

alexmiller · Answer 1 · 2023-01-10T18:46:08+0000

我一段时间内没有时间查看这个，但是很容易在循环/递归边界陷入“盒子思维”中，这将导致速度显著降低，但最容易通过查看字节码来确定。

在2024 Clojure 状态调查中分享您的想法！

在数值基准测试中对奇数性能惩罚，通过转换为双精度浮点数得到了缓解...

请登录或注册以添加评论。

请登录或注册以回答此问题。

1 个回答

请登录或注册以添加评论。

分类

在2024 Clojure 状态调查中分享您的想法！

在数值基准测试中对奇数性能惩罚，通过转换为双精度浮点数得到了缓解...

请登录或注册以添加评论。

请登录或注册以回答此问题。

1 个回答

请登录或注册以添加评论。

相关问题

分类