在2024 Clojure 状态调查！中分享您的想法。

Question

使用双精度浮点数可以减轻数字基准测试中的奇偶性能惩罚...

提问 Jan 10, 2023 在 Java 互操作由 Tom

我在这里找到了性能基准测试，我很想知道为什么 clojure 被击败了。

所以我将它扔到分析器中（修改了版本以使用未检查的数学 - 这并没有帮助），但没有任何显示。嗯。反汇编后找到

// Decompiling class: leibniz$calc_pi_leibniz
import clojure.lang.*;

public final class leibniz$calc_pi_leibniz extends AFunction implements LD
{
    public static double invokeStatic(final long rounds) {
        final long end = 2L + rounds;
        long i = 2L;
        double x = 1.0;
        double pi = 1.0;
        while (i != end) {
            final double x2 = -x;
            final long n = i + 1L;
            final double n2 = x2;
            pi += Numbers.divide(x2, 2L * i - 1L);
            x = n2;
            i = n;
        }
        return Numbers.unchecked_multiply(4L, pi);
    }

    @Override
    public Object invoke(final Object o) {
        return invokeStatic(RT.uncheckedLongCast(o));
    }

    @Override
    public final double invokePrim(final long rounds) {
        return invokeStatic(rounds);
    }
}

看起来整数/长整型边界至少要花费我们在一个方法查找上，可能是 Numbers.divide?
所以我就将所有东西都强制转换为 double（甚至我们的索引变量）

(def rounds 100000000)

(defn calc-pi-leibniz2
  "Eliminate mixing of long/double to avoid clojure.numbers invocations."
  ^double
  [^long rounds]
  (let [end (+ 2.0 rounds)]
    (loop [i 2.0 x 1.0 pi 1.0]
      (if (= i end)
        (* 4.0 pi)
        (let [x (- x)]
          (recur (inc i) x (+ pi (/ x (dec (* 2 i))))))))))

leibniz=> (c/quick-bench (calc-pi-leibniz rounds))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 575.352216 ms
    Execution time std-deviation : 10.070268 ms
   Execution time lower quantile : 566.210399 ms ( 2.5%)
   Execution time upper quantile : 588.772187 ms (97.5%)
                   Overhead used : 1.884700 ns
nil
leibniz=> (c/quick-bench (calc-pi-leibniz2 rounds))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 158.509049 ms
    Execution time std-deviation : 759.113165 ╡s
   Execution time lower quantile : 157.234899 ms ( 2.5%)
   Execution time upper quantile : 159.205374 ms (97.5%)
                   Overhead used : 1.884700 ns
nil

为什么java 实现不支付相同的除法惩罚？[两个版本都使用 unchecked-math 实现，且在 :warn-on-boxed]。)

我还试了一个使用 fastmath 原始数学运算符的变体，但实际上更慢。到目前为止，没有什么能打败将循环索引 i 强制转换为 double（这在通常我不会做）。

评论 Jan 10, 2023 由 Ben Sless

在我的基准测试中，这在不对索引进行强制转换为 double 的情况下得到了与您的 double 解决方案相同的表现。

(defn calc-pi-leibniz3
  "消除 long/double 混合以避免 clojure.numbers 调用。"
  ^double
  [^long rounds]
  (let [end (+ 2 rounds)]
    (loop [i 2 x 1.0 pi 1.0]
      (if (= i end)
        (* 4.0 pi)
        (let [x (- x)]
          (recur (inc i) x (+ pi (/ x (double (unchecked-dec-int (unchecked-multiply-int (unchecked-int 2) (unchecked-int i))))))))))))

这是实现的具体之处

这是 java 方案的字节码

  public static double go(int);
    descriptor: (I)D
    flags: (0x0009) ACC_PUBLIC, ACC_STATIC
    Code
      栈大小=6, 本地变量=6, 参数大小=1
         0: dconst_1
         1: dstore_1
         2: dconst_1
         3: dstore_3
         4: iconst_2
         5: istore        5
         7: iload         5
         9: iload_0
        10: iconst_2
        11: iadd
        12: if_icmpge     39
        15: dload_3
        16: ldc2_w        #24                 // double -1.0d
        19: dmul
        20: dstore_3
        21: dload_1
        22: dload_3
        23: iconst_2
        24: iload         5
        26: imul
        27: iconst_1
        28: isub
        29: i2d
        30: ddiv
        31: dadd
        32: dstore_1
        33: iinc          5, 1
        36: goto          7
        39: dload_1
        40: ldc2_w        #26                 // double 4.0d
        43: dmul
        44: dup2
        45: dstore_1
        46: dreturn

这里是为了你的解决方案

    public static double invokeStatic(long rounds);
        标志：PUBLIC, STATIC
        代码
               0: ldc2_w          2.0
               3: lload_0         /* rounds */
               4: invokestatic    clojure/lang/Numbers.add:(DJ)D
               7: dstore_2        /* end */
               8: ldc2_w          2.0
              11: dstore          i
              13: dconst_1
              14: dstore          x
              16: dconst_1
              17: dstore          pi
              19: dload           i
              21: dload_2         /* end */
              22: dcmpl
              23: ifne            36
              26: ldc2_w          4.0
              29: dload           pi
              31: dmul
              32: goto            72
              35: athrow
              36: dload           x
              38: dneg
              39: 将 x 存储到数据存储
              41: 从数据存储加载 i
              43: 数据常量 1
              44: 双精度加法
              45: 从数据存储加载 x
              47: 从数据存储加载 pi
              49: 从数据存储加载 x
              51: 加载双精度字面量 2
              54: 从数据存储加载 i
              56: 调用静态方法 clojure/lang/Numbers.multiply:(JD)D
              59: 数据常量 1
              60: 双精度减法
              61: 双精度除法
              62: 双精度加法
              63: 将结果存储到 pi
              65: 将结果存储到 x
              67: 将结果存储到 i
              69: 跳转到 19
              72: 数据返回

我的解决方案

    public static double invokeStatic(long rounds);
        标志：PUBLIC, STATIC
        代码
              0: 加载双精度字面量 2
               3: lload_0         /* rounds */
              4: 长整数加法
              5: 将结果存储到索引 2（终结）
              6: 加载双精度字面量 2
              9: 将结果存储到 i
              11: 数据常量 1
              12: 将结果存储到 x
              14: 数据常量 1
              15: 将结果存储到 pi
              17: 加载长整数 i
              19: 加载索引 2（终结）
              20: 长整数比较
              21: 如果不等于跳转到 34
              24: 加载双精度字面量 4.0
              27: 从数据存储加载 pi
              29: 双精度乘法
              30: 跳转到 71
              33: 抛出异常
              34: 从数据存储加载 x
              36: 双精度取负
              37: 将结果存储到 x
              39: 加载长整数 i
              41: 长整数常量 1
              42: 长整数加法
              43: 从数据存储加载 x
              45: 从数据存储加载 pi
              47: 从数据存储加载 x
              49: ldc2_w          2
              53: lload           i
              55: l2i
              56: imul
              57: iconst_1
              58: isub
              59: i2d
              60: ddiv
              61: dadd
              62: dstore          pi
              64: dstore          x
              66: lstore          i
              68: goto            17
              71: dreturn

它还避免了所有方法调用，直接与原始数据类型进行操作

尽管如此，与其他解决方案相比，性能并没有得到提升

根据我的基准测试，两种解决方案的性能与Java解决方案相同

发表了评论 2023-01-10 由 Tom

2023-01-12 Dec 12, 2023 by Chris Nuernberger

1 答案

alexmiller · Answer 1 · 2023-01-10T18:46:08+0000

一段时间内我不会有时间去看这个，但是很容易在循环/递归边界上陷入闭包，这会导致显著的性能下降，但最简单的方法是查看字节码。

在2024 Clojure 状态调查！中分享您的想法。

使用双精度浮点数可以减轻数字基准测试中的奇偶性能惩罚...

请登录或注册以添加评论。

请登录或注册以回答此问题。

1 答案

请登录或注册以添加评论。

类别

在2024 Clojure 状态调查！中分享您的想法。

使用双精度浮点数可以减轻数字基准测试中的奇偶性能惩罚...

请登录或注册以添加评论。

请登录或注册以回答此问题。

1 答案

请登录或注册以添加评论。

相关问题

类别