似乎编译器生成的字节码即使所有输入(工具、依赖项、编译器选项、代码)都保持稳定,也不是100%确定性。这可以通过以下脚本观察到,脚本只编译相同的代码直到连续两次运行的结果不同
#!/usr/bin/env bash
set -euo pipefail
compile() {
mkdir -p classes/curr
clojure -Sdeps '{:path ["src" "classes/curr"]}' \
-M -e "(binding [*compile-path* \"classes/curr\"] (compile 'foo) nil)"
if [ -d "classes/prev" ]; then
diff <(cd "classes/prev" && sha256sum * | sort -k2) \
<(cd "classes/curr" && sha256sum * | sort -k2)
fi
}
run() {
rm -rf classes/prev classes/curr
compile
local n=1
while compile; do
echo $n
rm -rf classes/prev
mv classes/curr classes/prev
n=$(($n+1))
done
}
run
其中src/foo.clj
包含以下代码(改编自我在Aleph中首次遇到这个问题的一些真实世界的代码)
(ns foo)
(defn bar []
(let [a 1
b 2
c (delay 3)
{:keys [foo bar baz qux bla frob]} {:foo "ha"
:bar 4}]
#(clojure.lang.ArraySeq/create (into-array [a b @c bar]))))
我使用OpenJDK 11.0.15+10和Clojure CLI 1.11.1.1149(因此是Clojure 1.11.1)在Linux 5.15.59上运行了该脚本。几轮10秒之后,结果类似于以下内容
2,3c2,3
< 57496515c08ffd087a1f3e3e0d6e420c291b27a15d883c93cdad5de1c2cd8bf6 foo$bar$fn__145.class
< f6d5832ee0ee590056911b70da99b525d93d5b0280feb8e9d34e2f214de5dedd foo$bar.class
---
> eff4dac36986b909c7dacb63b87f2033dc5be62ee5e0b2a3a2a4207e79a77c41 foo$bar$fn__145.class
> 3a9b9f33e4eeed8b80810a02dbeb0e4d72fac83d496409e5cf7c6ff78fa36ff5 foo$bar.class
使用以下方法比较反汇编的类文件
$ diff -u <(javap -l -c -s -private classes/prev/foo\$bar\$fn__145.class) <(javap -l -c -s -private classes/curr/foo\$bar\$fn__145.class)
结果是
@@ -3,23 +3,23 @@
java.lang.Object c;
descriptor: Ljava/lang/Object;
- long a;
- descriptor: J
-
long b;
descriptor: J
java.lang.Object bar;
descriptor: Ljava/lang/Object;
+ long a;
+ descriptor: J
+
public static final clojure.lang.Var const__0;
descriptor: Lclojure/lang/Var;
public static final clojure.lang.Var const__1;
descriptor: Lclojure/lang/Var;
- public foo$bar$fn__145(java.lang.Object, long, long, java.lang.Object);
- descriptor: (Ljava/lang/Object;JJLjava/lang/Object;)V
+ public foo$bar$fn__145(java.lang.Object, long, java.lang.Object, long);
+ descriptor: (Ljava/lang/Object;JLjava/lang/Object;J)V
Code:
0: aload_0
1: invokespecial #16 // Method clojure/lang/AFunction."<init>":()V
@@ -28,13 +28,13 @@
6: putfield #18 // Field c:Ljava/lang/Object;
9: aload_0
10: lload_2
- 11: putfield #20 // Field a:J
+ 11: putfield #20 // Field b:J
14: aload_0
- 15: lload 4
- 17: putfield #22 // Field b:J
+ 15: aload 4
+ 17: putfield #22 // Field bar:Ljava/lang/Object;
20: aload_0
- 21: aload 6
- 23: putfield #24 // Field bar:Ljava/lang/Object;
+ 21: lload 5
+ 23: putfield #24 // Field a:J
26: return
LineNumberTable:
line 4: 0
@@ -46,10 +46,10 @@
3: invokevirtual #35 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
6: checkcast #37 // class clojure/lang/IFn
9: aload_0
- 10: getfield #20 // Field a:J
+ 10: getfield #24 // Field a:J
13: invokestatic #43 // Method clojure/lang/Numbers.num:(J)Ljava/lang/Number;
16: aload_0
- 17: getfield #22 // Field b:J
+ 17: getfield #20 // Field b:J
20: invokestatic #43 // Method clojure/lang/Numbers.num:(J)Ljava/lang/Number;
23: getstatic #46 // Field const__1:Lclojure/lang/Var;
26: invokevirtual #35 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
@@ -58,7 +58,7 @@
33: getfield #18 // Field c:Ljava/lang/Object;
36: invokeinterface #49, 2 // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
41: aload_0
- 42: getfield #24 // Field bar:Ljava/lang/Object;
+ 42: getfield #22 // Field bar:Ljava/lang/Object;
45: invokestatic #55 // Method clojure/lang/Tuple.create:(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Lclojure/lang/IPersistentVector;
48: invokeinterface #49, 2 // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
53: checkcast #57 // class "[Ljava/lang/Object;"
正如您所看到的,闭包局部变量的顺序似乎随机改变了。当系统处于重负载下(例如同时运行Clojure的完整测试套件时)这种发生的可能性会更大。这让我相信这可能是内存分配导致的。确实,在检查Compiler.java
中的相关代码时,我认为我可能找到了罪魁祸首:该CLEAR_SITES
映射使用LocalBinding
实例作为键,但该类没有实现一个确定的hashCode
方法,因此回退到默认的Object
实现。据我所知,这依赖于内部内存地址。确实,当我提供一个类似的实现时,我一直无法重现这个问题
public int hashCode(){
return Util.hashCombine(idx, sym.hashCode());
}
现在我在任何地方都没有看到编译器对可重现构建非常关心。但是,鉴于其他方面似乎在这方面很稳健,而且修复只需要一个小改变,使其更加稳健,我认为提出这个问题是有意义的。