PyHive#

PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Hive. PyHive can connect with the Kyuubi server serving in thrift protocol as HiveServer2.

Requirements#

PyHive works with Python 2.7 / Python 3. Install PyHive via pip for the Hive interface.

pip install 'pyhive[hive]'

Usage#

Use the Kyuubi server’s host and thrift protocol port to connect.

For further information about usages and features, e.g. DB-API async fetching, using in SQLAlchemy, please refer to project homepage.

DB-API#

from pyhive import hive
cursor = hive.connect(host=kyuubi_host,port=10009).cursor()
cursor.execute('SELECT * FROM my_awesome_data LIMIT 10')
print(cursor.fetchone())
print(cursor.fetchall())

Use PyHive with Pandas#

PyHive provides a handy way to establish a SQLAlchemy compatible connection and works with Pandas dataframe for executing SQL and reading data via pandas.read_sql.

from pyhive import hive
import pandas as pd

# open connection
conn = hive.Connection(host=kyuubi_host,port=10009)

# query the table to a new dataframe
dataframe = pd.read_sql("SELECT id, name FROM test.example_table", conn)

Authentication#

If password is provided for connection, make sure the auth param set to either CUSTOM or LDAP.

# open connection
conn = hive.Connection(host=kyuubi_host,port=10009, 
user='user', password='password', auth='CUSTOM')