请在2024 Clojure 状态调查！分享您的想法。

Question

在数值基准测试中表现不佳的奇数惩罚，通过转换为双精度浮点数得到缓解...

提出 2023-01-10 在 Java 交互由 Tom

我在此处偶然发现了性能基准，我很想知道为什么 Clojure 的性能比 Java 差。

所以我将其放入了分析器中（修改了他们的版本以使用未经检查的数学运算——这并没有帮助），但没有显示任何结果。嗯。反编译并发现

// Decompiling class: leibniz$calc_pi_leibniz
import clojure.lang.*;

public final class leibniz$calc_pi_leibniz extends AFunction implements LD
{
    public static double invokeStatic(final long rounds) {
        final long end = 2L + rounds;
        long i = 2L;
        double x = 1.0;
        double pi = 1.0;
        while (i != end) {
            final double x2 = -x;
            final long n = i + 1L;
            final double n2 = x2;
            pi += Numbers.divide(x2, 2L * i - 1L);
            x = n2;
            i = n;
        }
        return Numbers.unchecked_multiply(4L, pi);
    }

    @Override
    public Object invoke(final Object o) {
        return invokeStatic(RT.uncheckedLongCast(o));
    }

    @Override
    public final double invokePrim(final long rounds) {
        return invokeStatic(rounds);
    }
}

所以看起来整数/长边界至少导致了方法查找的开销，可能在 Numbers.divide 中？
所以我将所有内容都强制转换为 double（甚至包括我们的索引变量

(def rounds 100000000)

(defn calc-pi-leibniz2
  "Eliminate mixing of long/double to avoid clojure.numbers invocations."
  ^double
  [^long rounds]
  (let [end (+ 2.0 rounds)]
    (loop [i 2.0 x 1.0 pi 1.0]
      (if (= i end)
        (* 4.0 pi)
        (let [x (- x)]
          (recur (inc i) x (+ pi (/ x (dec (* 2 i))))))))))

leibniz=> (c/quick-bench (calc-pi-leibniz rounds))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 575.352216 ms
    Execution time std-deviation : 10.070268 ms
   Execution time lower quantile : 566.210399 ms ( 2.5%)
   Execution time upper quantile : 588.772187 ms (97.5%)
                   Overhead used : 1.884700 ns
nil
leibniz=> (c/quick-bench (calc-pi-leibniz2 rounds))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 158.509049 ms
    Execution time std-deviation : 759.113165 ╡s
   Execution time lower quantile : 157.234899 ms ( 2.5%)
   Execution time upper quantile : 159.205374 ms (97.5%)
                   Overhead used : 1.884700 ns
nil

有什么想法解释为什么Java 实现在除法运算中没有支付相同的惩罚？[这两个版本都使用 unchecked-math 在 :warn-on-boxed 中实现。

我还尝试使用 fastmath 的原语数学运算符的变体，但实际上变得更慢。到目前为止，还没有任何方法能打败将循环索引 i 强制转换为 double（这通常是绝对不会做的）。

评论 2023-01-10 由 Ben Sless

在我的基准测试中，这与您的双精度解决方案的性能相同，而无需将索引转换为 double。

(defn calc-pi-leibniz3
  "消除 long/double 混合，以避免 clojure.numbers 调用。"
  ^double
  [^long rounds]
  (let [end (+ 2 rounds)]
    (loop [i 2 x 1.0 pi 1.0]
      (if (= i end)
        (* 4.0 pi)
        (let [x (- x)]
          (recur (inc i) x (+ pi (/ x (double (unchecked-dec-int (unchecked-multiply-int (unchecked-int 2) (unchecked-int i))))))))))))

这关乎于意识到每个编译器插入的强制类型转换指令的位置

这是 Java 解决方案的字节码

  public static double go(int);
    描述符： (I)D
    标记： (0x0009) ACC_PUBLIC, ACC_STATIC
    代码
      栈深度=6, 局部变量=6, 参数大小=1
         0: dconst_1
         1: dstore_1
         2: dconst_1
         3: dstore_3
         4: iconst_2
         5: istore        5
         7: iload         5
        9: iload_0
       10: iconst_2
       11: iadd
       12: if_icmpge    39
       15: dload_3
       16: ldc2_w       #24          double -1.0d
       19: dmul
       20: dstore_3
       21: dload_1
       22: dload_3
       23: iconst_2
       24: iload         5
       26: imul
       27: iconst_1
       28: isub
       29: i2d
       30: ddiv
       31: dadd
       32: dstore_1
       33: iinc        5, 1
       36: goto       7
       39: dload_1
       40: ldc2_w       #26         double 4.0d
       43: dmul
       44: dup2
       45: dstore_1
       46: dreturn

这里是为了你的解决方案

    public static double invokeStatic(long rounds);
       标志: PUBLIC, STATIC
        代码
             0: ldc2_w       2.0
             3: lload_0        /* rounds */
             4: invokestatic    clojure/lang/Numbers.add:(DJ)D
             7: dstore_2        /* end */
             8: ldc2_w       2.0
             11: dstore        /* i */
             13: dconst_1
             14: dstore        /* x */
             16: dconst_1
             17: dstore        /* pi */
             19: dload        /* i */
             21: dload_2        /* end */
             22: dcmpl
             23: ifne       36
             26: ldc2_w       4.0
             29: dload        /* pi */
             31: dmul
             32: goto       72
             35: athrow
             36: dload        /* x */
             38: dneg
              39: dstore           x
              41: dload           i
              43: dconst_1
              44: dadd
              45: dload           x
              47: dload           pi
              49: dload           x
              51: ldc2_w          2
              54: dload           i
              56: invokestatic    clojure/lang/Numbers.multiply:(JD)D
              59: dconst_1
              60: dsub
              61: ddiv
              62: dadd
              63: dstore           pi
              65: dstore           x
              67: dstore           i
              69: goto           19
              72: dreturn

我的解决方案

    public static double invokeStatic(long rounds);
       标志: PUBLIC, STATIC
        代码
              0: ldc2_w          2
             3: lload_0        /* rounds */
              4: ladd
              5: lstore_2         /* 结束 */
              6: ldc2_w          2
              9: lstore           i
              11: dconst_1
              12: dstore           x
              14: dconst_1
              15: dstore           pi
              17: lload           i
              19: lload_2          /* 结束 */
              20: lcmp
              21: ifne           34
              24: ldc2_w          4.0
              27: dload           pi
              29: dmul
              30: goto           71
              33: athrow
              34: dload           x
              36: dneg
              37: dstore           x
              39: lload           i
              41: lconst_1
              42: ladd
              43: dload           x
              45: 加载 dload           pi
              47: 加载 dload           x
              49: forced constant load ldc2_w          2
              53: 加载 lload           i
              55: 指令 l2i
              56: 立即乘法 imul
              57: 立即常量 iconst_1
              58: 立即减法 isub
              59: 指令 i2d
              60: 双精度除法 ddiv
              61: 双精度加法 dadd
              62: 存储双精度值 dstore          pi
              64: 存储双精度值 dstore          x
              66: 存储本地变量 lstore          i
              68: 跳转到 goto            17
              71: 双精度返回 dreturn

它还避免了所有方法调用，并直接与基本数据类型一起工作

尽管有这些手 waving（手势

根据我的基准测试，两种解决方案的性能与 Java 相同

commented Jan 10, 2023 by Tom

commented Jan 12, 2023 by Chris Nuernberger

1 回答

alexmiller · Answer 1 · 2023-01-10T18:46:08+0000

我暂时没时间查看这个，但很容易在循环/递归边界陷入boxing（装箱），这会导致明显的性能下降，但最简单的方法是通过查看字节码来确认。

请在2024 Clojure 状态调查！分享您的想法。

在数值基准测试中表现不佳的奇数惩罚，通过转换为双精度浮点数得到缓解...

请登录或注册以添加评论。

请登录或注册以回复此问题。

1 回答

请登录或注册以添加评论。

分类

请在2024 Clojure 状态调查！分享您的想法。

在数值基准测试中表现不佳的奇数惩罚，通过转换为双精度浮点数得到缓解...

请登录或注册以添加评论。

请登录或注册以回复此问题。

1 回答

请登录或注册以添加评论。

相关问题

分类