JVM Quake Support#

When facing out-of-control memory management in Spark engine, we typically use spark.driver/executor.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath={heapDumpPath} -XX:OnOutOfMemoryError="kill -9 %p" as a remedy by killing the process and generating a heap dump for post-analysis. However, even with jvm kill protection, we may still encounter issues caused by JVM running out of memory, such as repeated execution of Full GC without performing any useful work during the pause time. Since the JVM does not exhaust 100% of resources, JVMkill will not be triggered.

So introducing JVMQuake provides more granular monitoring of GC behavior, enabling early detection of memory management issues and facilitating fast failure.

Usage#

JVM Quake is implemented through Spark plugins, This plugin technically supports Spark 3.0 onwards, but was only verified with Spark 3.3 to 4.0 in CI.

Build with Apache Maven#

Spark JVM Quake Plugins is built using Apache Maven. To build it, cd to the root directory of kyuubi project and run:

build/mvn clean package -DskipTests -pl :kyuubi-spark-jvm-quake_2.12 -am

After a while, if everything goes well, you will get the plugin under ./extensions/spark/kyuubi-spark-jvm-quake/target/kyuubi-spark-jvm-quake_${scala.binary.version}-${project.version}.jar

Installing#

With the kyuubi-spark-jvm-quake_*.jar and its transitive dependencies available for spark runtime classpath, such as

  • Copied to $SPARK_HOME/jars, or

  • Specified to spark.jars configuration

Settings for Spark Plugins#

Add org.apache.spark.kyuubi.jvm.quake.SparkJVMQuakePlugin to the spark configuration spark.plugins.

spark.plugins=org.apache.spark.kyuubi.jvm.quake.SparkJVMQuakePlugin

Additional Configurations#

Name Default Value Description
spark.driver.jvmQuake.enabled false when true, enable driver jvmQuake
spark.executor.jvmQuake.enabled false when true, enable executor jvmQuake
spark.driver.jvmQuake.heapDump.enabled false when true, enable jvm heap dump when jvmQuake reach the threshold
spark.executor.jvmQuake.heapDump.enabled false when true, enable jvm heap dump when jvmQuake reach the threshold
spark.jvmQuake.killThreshold 200 The number of seconds to kill process
spark.jvmQuake.exitCode 502 The exit code of kill process
spark.jvmQuake.heapDumpPath /tmp/spark_jvm_quake/apps The path of heap dump
spark.jvmQuake.checkInterval 3 The number of seconds to check jvmQuake
spark.jvmQuake.runTimeWeight 1.0 The weight of rum time