看起来编译器生成的字节码即使在所有输入(工具、依赖项、编译器选项、代码)保持稳定的情况下,也不能保证100%确定性。这可以通过以下脚本观察到,该脚本只是不断编译相同的代码,直到两次连续运行的输出结果不同。
#!/usr/bin/env bash
set -euo pipefail
compile() {
mkdir -p classes/curr
clojure -Sdeps '{:path ["src" "classes/curr"]}' \
-M -e "(binding [*compile-path* \"classes/curr\"] (compile 'foo) nil)"
if [ -d "classes/prev" ]; then
diff <(cd "classes/prev" && sha256sum * | sort -k2) \
<(cd "classes/curr" && sha256sum * | sort -k2)
fi
}
run() {
rm -rf classes/prev classes/curr
compile
local n=1
while compile; do
echo $n
rm -rf classes/prev
mv classes/curr classes/prev
n=$(($n+1))
done
}
run
其中 src/foo.clj
包含以下代码(改编自我在 Aleph 中首次遇到此问题的真实代码示例)
(ns foo)
(defn bar []
(let [a 1
b 2
c (delay 3)
{:keys [foo bar baz qux bla frob]} {:foo "ha"
:bar 4}]
#(clojure.lang.ArraySeq/create (into-array [a b @c bar]))))
我在 OpenJDK 11.0.15+10 和 Clojure CLI 1.11.1.1149(即 Clojure 1.11.1)在 Linux 5.15.59 上运行了该脚本。在几次10秒的迭代后,结果类似于以下内容
2,3c2,3
< 57496515c08ffd087a1f3e3e0d6e420c291b27a15d883c93cdad5de1c2cd8bf6 foo$bar$fn__145.class
< f6d5832ee0ee590056911b70da99b525d93d5b0280feb8e9d34e2f214de5dedd foo$bar.class
---
> eff4dac36986b909c7dacb63b87f2033dc5be62ee5e0b2a3a2a4207e79a77c41 foo$bar$fn__145.class
> 3a9b9f33e4eeed8b80810a02dbeb0e4d72fac83d496409e5cf7c6ff78fa36ff5 foo$bar.class
这样比较反汇编的类文件
$ diff -u <(javap -l -c -s -private classes/prev/foo\$bar\$fn__145.class) <(javap -l -c -s -private classes/curr/foo\$bar\$fn__145.class)
结果显示
@@ -3,23 +3,23 @@
java.lang.Object c;
descriptor: Ljava/lang/Object;
- long a;
- descriptor: J
-
long b;
descriptor: J
java.lang.Object bar;
descriptor: Ljava/lang/Object;
+ long a;
+ descriptor: J
+
public static final clojure.lang.Var const__0;
descriptor: Lclojure/lang/Var;
public static final clojure.lang.Var const__1;
descriptor: Lclojure/lang/Var;
- public foo$bar$fn__145(java.lang.Object, long, long, java.lang.Object);
- descriptor: (Ljava/lang/Object;JJLjava/lang/Object;)V
+ public foo$bar$fn__145(java.lang.Object, long, java.lang.Object, long);
+ descriptor: (Ljava/lang/Object;JLjava/lang/Object;J)V
Code:
0: aload_0
1: invokespecial #16 // Method clojure/lang/AFunction."<init>":()V
@@ -28,13 +28,13 @@
6: putfield #18 // Field c:Ljava/lang/Object;
9: aload_0
10: lload_2
- 11: putfield #20 // Field a:J
+ 11: putfield #20 // Field b:J
14: aload_0
- 15: lload 4
- 17: putfield #22 // Field b:J
+ 15: aload 4
+ 17: putfield #22 // Field bar:Ljava/lang/Object;
20: aload_0
- 21: aload 6
- 23: putfield #24 // Field bar:Ljava/lang/Object;
+ 21: lload 5
+ 23: putfield #24 // Field a:J
26: return
LineNumberTable:
line 4: 0
@@ -46,10 +46,10 @@
3: invokevirtual #35 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
6: checkcast #37 // class clojure/lang/IFn
9: aload_0
- 10: getfield #20 // Field a:J
+ 10: getfield #24 // Field a:J
13: invokestatic #43 // Method clojure/lang/Numbers.num:(J)Ljava/lang/Number;
16: aload_0
- 17: getfield #22 // Field b:J
+ 17: getfield #20 // Field b:J
20: invokestatic #43 // Method clojure/lang/Numbers.num:(J)Ljava/lang/Number;
23: getstatic #46 // Field const__1:Lclojure/lang/Var;
26: invokevirtual #35 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
@@ -58,7 +58,7 @@
33: getfield #18 // Field c:Ljava/lang/Object;
36: invokeinterface #49, 2 // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
41: aload_0
- 42: getfield #24 // Field bar:Ljava/lang/Object;
+ 42: getfield #22 // Field bar:Ljava/lang/Object;
45: invokestatic #55 // Method clojure/lang/Tuple.create:(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Lclojure/lang/IPersistentVector;
48: invokeinterface #49, 2 // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
53: checkcast #57 // class "[Ljava/lang/Object;"
正如您所看到的,局部变量的顺序似乎随机发生了变化。在系统负载较重时(例如,同时运行 Clojure 的完整测试套件时),这种情况发生的概率会增加。这让我相信这可能是由于内存分配造成的。实际上,当检查 Compiler.java
中相关代码时,我认为我可能已经找到了罪魁祸首:该 CLEAR_SITES
映射使用 LocalBinding
实例作为键,但该类没有实现确定性的 hashCode
方法,因此回退到默认的 Object
实现方式,据我所知它依赖于内部内存地址。确实如此,当我提供像这样的实现时,我目前无法重现此问题。
public int hashCode(){
return Util.hashCombine(idx, sym.hashCode());
}
我还没有在任何地方看到编译器非常关心可重复构建。但鉴于在其他方面似乎也很稳定,并且修复将是对该功能进行的小幅修改,使其更加稳定,我认为提出这个修复可能是有意义的。