Gluten#
Apache Gluten is a Spark plugin developed by Intel, designed to accelerate Apache Spark with native libraries. Currently, only CentOS 7/8 and Ubuntu 20.04/22.04, along with certain Spark versions, are supported. Users can employ the following methods to utilize the Gluten with Velox native libraries.
Building(with velox Backend)#
Build gluten velox backend package#
Git clone gluten project, use gluten build script buildbundle-veloxbe.sh, and target package is in /path/to/gluten/package/target/
git clone https://github.com/apache/gluten.git
cd gluten
## The script builds jars for spark.
./dev/buildbundle-veloxbe.sh
Usage#
You can use Gluten to accelerate Spark by following steps.
Installing#
Add gluten jar: copy /path/to/gluten/package/target/gluten-velox-bundle-spark3.x_2.12-*.jar $SPARK_HOME/jars/ or specified to spark.jars configuration
Configure#
Add the following minimal configuration into spark-defaults.conf:
spark.plugins=org.apache.gluten.GlutenPlugin
spark.memory.offHeap.size=20g
spark.memory.offHeap.enabled=true
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
More configuration can be found in the documentation.