Gluten#

Apache Gluten is a Spark plugin developed by Intel, designed to accelerate Apache Spark with native libraries. Currently, only CentOS 7/8 and Ubuntu 20.04/22.04, along with certain Spark versions, are supported. Users can employ the following methods to utilize the Gluten with Velox native libraries.

Building(with velox Backend)#

Build gluten velox backend package#

Git clone gluten project, use gluten build script buildbundle-veloxbe.sh, and target package is in /path/to/gluten/package/target/

git clone https://github.com/apache/gluten.git
cd gluten

## The script builds jars for spark.
./dev/buildbundle-veloxbe.sh

Usage#

You can use Gluten to accelerate Spark by following steps.

Installing#

Add gluten jar: copy /path/to/gluten/package/target/gluten-velox-bundle-spark3.x_2.12-*.jar $SPARK_HOME/jars/ or specified to spark.jars configuration

Configure#

Add the following minimal configuration into spark-defaults.conf:

spark.plugins=org.apache.gluten.GlutenPlugin
spark.memory.offHeap.size=20g
spark.memory.offHeap.enabled=true
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager

More configuration can be found in the documentation.