../_images/kyuubi_logo_simple1.png

1. Configurations Guide

Kyuubi provides several ways to configure the system.

1.1. Environments

You can configure the environment variables in $KYUUBI_HOME/conf/kyuubi-env.sh, e.g, JAVA_HOME, then this java runtime will be used both for Kyuubi server instance and the applications it launches. You can also change the variable in the subprocess’s env configuration file, e.g.$SPARK_HOME/conf/spark-env.sh to use more specific ENV for SQL engine applications.

#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# - JAVA_HOME               Java runtime to use. By default use "java" from PATH.
#
#
# - KYUUBI_CONF_DIR         Directory containing the Kyuubi configurations to use.
#                           (Default: $KYUUBI_HOME/conf)
# - KYUUBI_LOG_DIR          Directory for Kyuubi server-side logs.
#                           (Default: $KYUUBI_HOME/logs)
# - KYUUBI_PID_DIR          Directory stores the Kyuubi instance pid file.
#                           (Default: $KYUUBI_HOME/pid)
# - KYUUBI_MAX_LOG_FILES    Maximum number of Kyuubi server logs can rotate to.
#                           (Default: 5)
# - KYUUBI_JAVA_OPTS        JVM options for the Kyuubi server itself in the form "-Dx=y".
#                           (Default: none).
# - KYUUBI_NICENESS         The scheduling priority for Kyuubi server.
#                           (Default: 0)
# - KYUUBI_WORK_DIR_ROOT    Root directory for launching sql engine applications.
#                           (Default: $KYUUBI_HOME/work)
# - HADOOP_CONF_DIR         Directory containing the Hadoop / YARN configuration to use.
#
# - SPARK_HOME              Spark distribution which you would like to use in Kyuubi.
# - SPARK_CONF_DIR          Optional directory where the Spark configuration lives.
#                           (Default: $SPARK_HOME/conf)
#


## Examples ##

# export JAVA_HOME=/usr/jdk64/jdk1.8.0_152
# export HADOOP_CONF_DIR=/usr/ndp/current/mapreduce_client/conf
# export KYUUBI_JAVA_OPTS="-Xmx10g -XX:+UnlockDiagnosticVMOptions -XX:ParGCCardsPerStrideChunk=4096 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark -XX:MaxDirectMemorySize=1024m  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:./logs/kyuubi-server-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=5M -XX:NewRatio=3 -XX:MetaspaceSize=512m"

1.2. Kyuubi Configurations

You can configure the Kyuubi properties in $KYUUBI_HOME/conf/kyuubi-defaults.conf. For example:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

## Kyuubi Configurations
#
# kyuubi.authentication           NONE
# kyuubi.frontend.bind.port       10009
#

## Spark Configurations, they will override those in $SPARK_HOME/conf/spark-defaults.conf
## Dummy Ones
# spark.master                      local
# spark.submit.deployMode           client
# spark.ui.enabled                  false
# spark.ui.port                     0
# spark.driver.extraJavaOptions     -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005
# spark.scheduler.mode              FAIR
# spark.serializer                  org.apache.spark.serializer.KryoSerializer
# spark.kryoserializer.buffer.max   128m
# spark.buffer.size                 131072
# spark.local.dir                   ./local
# spark.network.timeout             120s
# spark.cleaner.periodicGC.interval 10min

## Spark Driver / AM Sizing
# spark.driver.cores            4
# spark.driver.memory           8g
# spark.driver.memoryOverhead   2048
# spark.driver.extraJavaOptions -XX:MaxDirectMemorySize=2048m
# spark.driver.maxResultSize    3g
# spark.yarn.am.cores           4
# spark.yarn.am.memory	        2g
# spark.yarn.am.memoryOverhead	1024

## Spark Executor Sizing
# spark.executor.instances        100
# spark.executor.cores            4
# spark.executor.memory           16g
# spark.executor.memoryOverhead   4096
# spark.executor.extraJavaOptions -XX:MaxDirectMemorySize=2048m

## Executor Heartbeat
# spark.storage.blockManagerHeartbeatTimeoutMs                       300s
# spark.executor.heartbeatInterval                                   15s
# spark.executor.heartbeat.maxFailures                               30


## Event Queue Capacity
# spark.scheduler.revive.interval                                    1s
# spark.scheduler.listenerbus.eventqueue.capacity                    100000
# spark.scheduler.listenerbus.eventqueue.executorManagement.capacity 100000
# spark.scheduler.listenerbus.eventqueue.appStatus.capacity          100000
# spark.scheduler.listenerbus.eventqueue.shared.capacity             100000
# spark.scheduler.listenerbus.eventqueue.eventLog.capacity           20000

## External Shuffle Service
# spark.shuffle.service.enabled                             true
# spark.shuffle.service.fetch.rdd.enabled                   true
# spark.shuffle.service.port                                7337

## Speculation
# spark.speculation                         true
# spark.speculation.interval                1s
# spark.speculation.multiplier              1.5
# spark.speculation.quantile                0.9
# spark.speculation.task.duration.threshold 10min

## Shuffle Behavior
# spark.shuffle.compress                                    true
# spark.shuffle.detectCorrupt                               true
# spark.shuffle.detectCorrupt.useExtraMemory                true
# spark.shuffle.file.buffer                                 64k
# spark.shuffle.unsafe.file.output.buffer                   64k
# spark.shuffle.spill.diskWriteBufferSize                   8k
# spark.shuffle.spill.compress                              true
# spark.shuffle.mapOutput.dispatcher.numThreads             12
# spark.shuffle.mapOutput.parallelAggregationThreshold      5000
# spark.shuffle.readHostLocalDisk                           true
# spark.shuffle.io.maxRetries                               10
# spark.shuffle.io.retryWait                                6s
# spark.shuffle.io.preferDirectBufs                         false
# spark.shuffle.io.serverThreads                            8
# spark.shuffle.io.clientThreads                            8
# spark.shuffle.io.connectionTimeout                        240s
# spark.shuffle.registration.timeout                        6000
# spark.shuffle.registration.maxAttempts                    10
# spark.shuffle.sync                                        false
# spark.shuffle.useOldFetchProtocol                         true
# spark.shuffle.unsafe.fastMergeEnabled                     true
# spark.shuffle.minNumPartitionsToHighlyCompress            100
# spark.network.maxRemoteBlockSizeFetchToMem                128m
# spark.reducer.maxSizeInFlight                             48m
# spark.reducer.maxReqsInFlight                             256
# spark.reducer.maxBlocksInFlightPerAddress                 256

## Data Locality for Task Schedule
# spark.locality.wait                                       0s
# spark.locality.wait.process                               0s
# spark.locality.wait.node                                  0s
# spark.locality.wait.rack                                  0s

## Event Logging for History Server
# spark.eventLog.enabled                            true
# spark.eventLog.dir                                hdfs://hadoop-dfs/history
# spark.eventLog.compress                           true
# spark.eventLog.longForm.enabled                   true
# spark.eventLog.rolling.enabled                    true
# spark.yarn.historyServer.address                  http://historyserver:18080

## SQL
## General SQL Settings
# spark.sql.shuffle.partitions                              8192
# spark.sql.optimizer.inSetConversionThreshold              2
# spark.sql.autoBroadcastJoinThreshold                      64m
# spark.sql.broadcastTimeout                                600s
# spark.sql.join.preferSortMergeJoin                        true
# spark.sql.hive.metastorePartitionPruning                  true
# spark.sql.parquet.filterPushdown                          true
# spark.sql.parquet.recordLevelFilter.enabled	            true
# spark.sql.statistics.fallBackToHdfs	                    true
## Dynamic Partition Pruning
# spark.sql.optimizer.dynamicPartitionPruning.enabled             true
# spark.sql.optimizer.dynamicPartitionPruning.useStats            true
# spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio 0.5
# spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly  true

1.2.1. Authentication

Key Default Meaning Since
kyuubi.authentication
NONE
Client authentication types.
  • NOSASL: raw transport.
  • NONE: no authentication check.
  • KERBEROS: Kerberos/GSSAPI authentication.
  • LDAP: Lightweight Directory Access Protocol authentication.
1.0.0
kyuubi.authentication
.ldap.base.dn
<undefined>
LDAP base DN.
1.0.0
kyuubi.authentication
.ldap.domain
<undefined>
LDAP domain.
1.0.0
kyuubi.authentication
.ldap.url
<undefined>
SPACE character separated LDAP connection URL(s).
1.0.0
kyuubi.authentication
.sasl.qop
auth
Sasl QOP enable higher levels of protection for Kyuubi communication with clients.
  • auth - authentication only (default)
  • auth-int - authentication plus integrity protection
  • auth-conf - authentication plus integrity and confidentiality protection. This is applicable only if Kyuubi is configured to use Kerberos authentication.
1.0.0

1.2.2. Backend

Key Default Meaning Since
kyuubi.backend.engine
.exec.pool.keepalive
.time
PT1M
Time(ms) that an idle async thread of the operation execution thread pool will wait for a new task to arrive before terminating in SQL engine applications
1.0.0
kyuubi.backend.engine
.exec.pool.shutdown
.timeout
PT10S
Timeout(ms) for the operation execution thread pool to terminate in SQL engine applications
1.0.0
kyuubi.backend.engine
.exec.pool.size
100
Number of threads in the operation execution thread pool of SQL engine applications
1.0.0
kyuubi.backend.engine
.exec.pool.wait.queue
.size
100
Size of the wait queue for the operation execution thread pool in SQL engine applications
1.0.0
kyuubi.backend.server
.exec.pool.keepalive
.time
PT1M
Time(ms) that an idle async thread of the operation execution thread pool will wait for a new task to arrive before terminating in Kyuubi server
1.0.0
kyuubi.backend.server
.exec.pool.shutdown
.timeout
PT10S
Timeout(ms) for the operation execution thread pool to terminate in Kyuubi server
1.0.0
kyuubi.backend.server
.exec.pool.size
100
Number of threads in the operation execution thread pool of Kyuubi server
1.0.0
kyuubi.backend.server
.exec.pool.wait.queue
.size
100
Size of the wait queue for the operation execution thread pool of Kyuubi server
1.0.0

1.2.3. Delegation

Key Default Meaning Since
kyuubi.delegation.key
.update.interval
PT24H
unused yet
1.0.0
kyuubi.delegation
.token.gc.interval
PT1H
unused yet
1.0.0
kyuubi.delegation
.token.max.lifetime
PT168H
unused yet
1.0.0
kyuubi.delegation
.token.renew.interval
PT168H
unused yet
1.0.0

1.2.4. Frontend

Key Default Meaning Since
kyuubi.frontend
.backoff.slot.length
PT0.1S
Time to back off during login to the frontend service.
1.0.0
kyuubi.frontend.bind
.host
<undefined>
Hostname or IP of the machine on which to run the frontend service.
1.0.0
kyuubi.frontend.bind
.port
10009
Port of the machine on which to run the frontend service.
1.0.0
kyuubi.frontend.login
.timeout
PT20S
Timeout for Thrift clients during login to the frontend service.
1.0.0
kyuubi.frontend.max
.message.size
104857600
Maximum message size in bytes a Kyuubi server will accept.
1.0.0
kyuubi.frontend.max
.worker.threads
999
Maximum number of threads in the of frontend worker thread pool for the frontend service
1.0.0
kyuubi.frontend.min
.worker.threads
9
Minimum number of threads in the of frontend worker thread pool for the frontend service
1.0.0
kyuubi.frontend
.worker.keepalive.time
PT1M
Keep-alive time (in milliseconds) for an idle worker thread
1.0.0

1.2.5. Ha

Key Default Meaning Since
kyuubi.ha.zookeeper
.acl.enabled
false
Set to true if the zookeeper ensemble is kerberized
1.0.0
kyuubi.ha.zookeeper
.connection.base.retry
.wait
1000
Initial amount of time to wait between retries to the zookeeper ensemble
1.0.0
kyuubi.ha.zookeeper
.connection.max
.retries
3
Max retry times for connecting to the zookeeper ensemble
1.0.0
kyuubi.ha.zookeeper
.connection.max.retry
.wait
30000
Max amount of time to wait between retries for BONDED_EXPONENTIAL_BACKOFF policy can reach, or max time until elapsed for UNTIL_ELAPSED policy to connect the zookeeper ensemble
1.0.0
kyuubi.ha.zookeeper
.connection.retry
.policy
EXPONENTIAL_BACKOFF
The retry policy for connecting to the zookeeper ensemble, all candidates are:
  • ONE_TIME
  • N_TIME
  • EXPONENTIAL_BACKOFF
  • BONDED_EXPONENTIAL_BACKOFF
  • UNTIL_ELAPSED
1.0.0
kyuubi.ha.zookeeper
.connection.timeout
15000
The timeout(ms) of creating the connection to the zookeeper ensemble
1.0.0
kyuubi.ha.zookeeper
.namespace
kyuubi
The root directory for the service to deploy its instance uri. Additionally, it will creates a -[username] suffixed root directory for each application
1.0.0
kyuubi.ha.zookeeper
.quorum
The connection string for the zookeeper ensemble
1.0.0
kyuubi.ha.zookeeper
.session.timeout
60000
The timeout(ms) of a connected session to be idled
1.0.0

1.2.6. Kinit

Key Default Meaning Since
kyuubi.kinit.interval
PT1H
How often will Kyuubi server run kinit -kt [keytab] [principal] to renew the local Kerberos credentials cache
1.0.0
kyuubi.kinit.keytab
<undefined>
Location of Kyuubi server's keytab.
1.0.0
kyuubi.kinit.max
.attempts
10
How many times will kinit process retry
1.0.0
kyuubi.kinit
.principal
<undefined>
Name of the Kerberos principal.
1.0.0

1.2.7. Operation

Key Default Meaning Since
kyuubi.operation.idle
.timeout
PT3H
Operation will be closed when it's not accessed for this duration of time
1.0.0
kyuubi.operation
.status.polling
.timeout
PT5S
Timeout(ms) for long polling asynchronous running sql query's status
1.0.0

1.2.8. Session

Key Default Meaning Since
kyuubi.session.check
.interval
PT5M
The check interval for session timeout.
1.0.0
kyuubi.session.engine
.check.interval
PT5M
The check interval for engine timeout
1.0.0
kyuubi.session.engine
.idle.timeout
PT30M
engine timeout, the engine will self-terminate when it's not accessed for this duration
1.0.0
kyuubi.session.engine
.initialize.timeout
PT1M
Timeout for starting the background engine, e.g. SparkSQLEngine.
1.0.0
kyuubi.session.engine
.log.timeout
PT24H
If we use Spark as the engine then the session submit log is the console output of spark-submit. We will retain the session submit log until over the config value.
1.1.0
kyuubi.session.engine
.login.timeout
PT15S
The timeout(ms) of creating the connection to remote sql query engine
1.0.0
kyuubi.session.engine
.share.level
USER
The SQL engine App will be shared in different levels, available configs are:
  • CONNECTION: the App will not be shared but only used by the current client connection
  • USER: the App will be shared by all sessions created by a unique username
  • GROUP: the App will be shared within a certain group (NOT YET)
  • SERVER: the App will be shared by Kyuubi servers
1.0.0
kyuubi.session.engine
.spark.main.resource
<undefined>
The package used to create Spark SQL engine remote application. If it is undefined, Kyuubi will use the default
1.0.0
kyuubi.session.engine
.startup.error.max
.size
8192
During engine bootstrapping, if error occurs, using this config to limit the length error message(characters).
1.1.0
kyuubi.session
.timeout
PT6H
session timeout, it will be closed when it's not accessed for this duration
1.0.0

1.2.9. Zookeeper

Key Default Meaning Since
kyuubi.zookeeper
.embedded.directory
embedded_zookeeper
The temporary directory for the embedded zookeeper server
1.0.0
kyuubi.zookeeper
.embedded.port
2181
The port of the embedded zookeeper server
1.0.0

1.3. Spark Configurations

1.3.1. Via spark-defaults.conf

Setting them in $SPARK_HOME/conf/spark-defaults.conf supplies with default values for SQL engine application. Available properties can be found at Spark official online documentation for Spark Configurations

1.3.2. Via kyuubi-defaults.conf

Setting them in $KYUUBI_HOME/conf/kyuubi-defaults.conf supplies with default values for SQL engine application too. These properties will override all settings in $SPARK_HOME/conf/spark-defaults.conf

1.3.3. Via JDBC Connection URL

Setting them in the JDBC Connection URL supplies session-specific for each SQL engine. For example: jdbc:hive2://localhost:10009/default;#spark.sql.shuffle.partitions=2;spark.executor.memory=5g

  • Runtime SQL Configuration

  • Static SQL and Spark Core Configuration

    • For Static SQL Configurations and other spark core configs, e.g. spark.executor.memory, they will take affect if there is no existing SQL engine application. Otherwise, they will just be ignored

1.3.4. Via SET Syntax

Please refer to the Spark official online documentation for SET Command

1.4. Logging

Kyuubi uses log4j for logging. You can configure it using $KYUUBI_HOME/conf/log4j.properties.

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} %p %c{2}: %m%n

1.5. Other Configurations

1.5.1. Hadoop Configurations

Specifying HADOOP_CONF_DIR to the directory contains hadoop configuration files or treating them as Spark properties with a spark.hadoop. prefix. Please refer to the Spark official online documentation for Inheriting Hadoop Cluster Configuration. Also, please refer to the Apache Hadoop’s online documentation for an overview on how to configure Hadoop.

1.5.2. Hive Configurations

These configurations are used for SQL engine application to talk to Hive MetaStore and could be configured in a hive-site.xml. Placed it in $SPARK_HOME/conf directory, or treating them as Spark properties with a spark.hadoop. prefix.

1.6. User Defaults

In Kyuubi, we can configure user default settings to meet separate needs. These user defaults override system defaults, but will be overridden by those from JDBC Connection URL or Set Command if could be. They will take effect when creating the SQL engine application ONLY.

User default settings are in the form of ___{username}___.{config key}. There are three continuous underscores(_) at both sides of the username and a dot(.) that separates the config key and the prefix. For example:

# For system defaults
spark.master=local
spark.sql.adaptive.enabled=true
# For a user named kent
___kent___.spark.master=yarn
___kent___.spark.sql.adaptive.enabled=false
# For a user named bob
___bob___.spark.master=spark://master:7077
___bob___.spark.executor.memory=8g

In the above case, if there are related configurations from JDBC Connection URL, kent will run his SQL engine application on YARN and prefer the Spark AQE to be off, while bob will activate his SQL engine application on a Spark standalone cluster with 8g heap memory for each executor and obey the Spark AQE behavior of Kyuubi system default. On the other hand, for those users who do not have custom configurations will use system defaults.