Data Warehouse

Integrating Python with Vector or Actian X

Actian Corporation

January 12, 2018

Performance testing production workloads for Ingres and Vector

Introduction

Today we’re going to look at how to get started with Actian Vector, our high-performance in-memory analytics database, and Python, one of the most oft-used languages by data scientists in recent times. Using the techniques below, you’ll be able to get started with Python using Vector ODBC and JDBC. (You’ll be able to use some of these techniques with Actian X as well; see the References section at the bottom for more information.) We’ll also discuss some simple basic functions of CURD (Create, Update, Read, and Delete) using Python.

The following code is tested with Python 2.7 32-bit. If you’re new to Vector, this setup should take 2-3 hours, and a bit less than that if you’re familiar with both Vector and Python.

Modules Needed for ODBC and Vector

  1. Pyodbc (which you can get at https://mkleehammer.github.io/pyodbc/)
  2. Vector ODBC driver

Note: pypyodbc would also work for same configuration however the following code is tested with pyodbc.

ODBC Connection Using DSN

In this section we will discuss the basics of making DSN in both Linux and Windows to be used by the ODBC connection.

Linux

In order to get DSN working with Linux, the environment needs to be setup correctly for this to work. Along with other Vector parameter that you source using source .ingVWsh (if installation code is VW) following needs to be exported or you can add the following in .bashrc to get ODBC running.

ODBCSYSINI=$II_SYSTEM/ingres/files
export ODBCSYSINI

Note: please set II_ODBC_WCHAR_SIZE if needed as per your ODBC driver manager.

‘iiodbcadmin’ Ingres utility can be used in Linux to create a DSN as example shown below.

Actian Vector VI Ingres Screenshot

Note:  If you are familiar with the Ingres ODBC driver and its settings, you could also edit odbc.ini as an example given below.  The details added will be reflected in ‘iiodbcadmin’ utility.

Example:

[ODBC Data Sources]
MYDSN=Ingres

[MyDSN]
Driver=/opt/Actian/VectorVI/ingres/lib/libiiodbcdriver.1.so
Description=
Vendor=Actian Corporation
DriverType=Ingres
HostName=(local)
ListenAddress=VI
Database=test
ServerType=Ingres
user=actian
password=actian

For more details on this check Configure a Data Source (Linux)

Windows

If you are using 32-bit python, you would need 32-bit client runtime to make use of 32-bit ODBC DSN. Likewise for 64-bit python, you would need 64-bit client runtime to make 64-bit ODBC DSN.

More details can be found at Configure a Data Source (Windows).

Here is a screenshot of a 32-bit DSN:

Ingres VT ODBC Screenshot

ODBC Connection in Python

You can use DSN to connect:

import pyodbc as pdb
conn = pdb.connect("dsn= MYDSN " )

With this usename and password:

conn = pdb.connect("dsn=TestDSN;uid=username;pwd=password" )

Here’s another example:

conn = pdb.connect("dsn=TestDSN;uid=actian;pwd=actian123" )

Or you can directly connect without any DSN using the various parameters as example shown below.

conn =  pdb.connect("driver=Ingres;servertype=ingres;server=@localhost,tcp_ip,VW;uid=actian;pwd=actian123;database=test")

Modules Needed for JDBC Connection

  1. jaydebeapi (https://pypi.python.org/pypi/JayDeBeApi/)
  2. Vector JDBC driver ( installed using clientruntime from ESD)

Jaydebeapi can be easily installed using ‘pip install JayDeBeApi’. However if due to certain issues you cannot use pip then you can also install manually. You can download the source code from https://github.com/baztian/jaydebeapi on your computer. Unzip and run ‘python setup.py install’ . However you would need to install dev tools on root (yum install gcc-c++ ) to install it manually.

JDBC Connection in Python

import jaydebeapi as jdb
conn = jdb.connect('com.ingres.jdbc.IngresDriver','jdbc:ingres://localhost:VI7/test' ,driver_args={'user': 'actian', 'password': 'actian'}, jars='iijdbc.jar')

cursor = conn.cursor()

Example: Create Table and Insert Values (Create in CURD)

A Simple Insert

cursor = conn.cursor()

#Drop table:

cursor .execute("DROP TABLE IF EXISTS customer")

#Create table:

cursor.execute("CREATE TABLE customer(customer_no INT NOT NULL PRIMARY KEY,last_name VARCHAR(40),first_name CHAR(40))")

conn.commit()

#Insert rows into customer table:

cursor .execute("INSERT INTO customer VALUES (1, 'Harry', 'Potter')")

cursor .execute("INSERT INTO customer VALUES (2, 'Ron','Weasley')")

cursor .execute("INSERT INTO customer VALUES (3, 'Draco', 'Malfoy')")

To execute single statements, .execute() is used.

Insert using Bind Variables

To insert many rows using bind variables you can do the following:

#Insert rows into customer table:

cursor .execute("INSERT INTO customer VALUES (?, ?, ?)", 4,'A', 'B')

Inserting multiple rows using .executemany()

data =[

(1, ‘Harry’, ‘Potter’),
(2, ‘Ron’,’Weasley’),
(3, ‘Draco’, ‘Malfoy’)]

cursor.executemany(“INSERT INTO customer VALUES (?,?,?)”,data)

#In case you have to insert data only to certain columns,

data =[

(8, 'A', ),
(9, 'B',),
(10, 'C', )]

cursor.executemany("insert into customer(customer_no, first_name) VALUES(?, ?)", data)

Another Example to Insert Many Rows

In case values to the column is to be added in certain ranges like in the following example elements in testid of table test will be added from  0 to 99:

cursor .execute("DROP TABLE IF EXISTS test")

#CREATING TABLE to insert many rows using executemany()

cursor .execute("CREATE TABLE test(testid varchar(100))")

data_to_insert = [(i,) for i in range(100)]

cursor.executemany("INSERT INTO test (testid) VALUES (?)", data_to_insert)

conn.close();

Update Data (the Update in cUrd)

Updating a Single Row

updatedata= ('X', 'Y', 10)

sql = 'UPDATE customer SET last_name = ? , first_name=? where customer_no =? '

cursor.execute(sql, updatedata)

Fetching Data (the Read in cuRd)

Fetching One Row

There are many functions to fetch data from like #fetchone(),  fetchall(), etc.:

cursor.execute("select count(*) from customer")

result=cursor.fetchone()

print(result[0])

cursor.close()

conn.close()

Fetching Many Rows

cursor.execute("select First 3 * from customer")

for row in cursor:

print ("Cust_no: ",row[0]," First Name: ",row[2]," Last Name: ",row[2])

cursor.close()

conn.close()

Results will be displayed as:

Cust_no:  1  First Name:  Potter                                 Last Name:  Potter

Cust_no:  2  First Name:  Weasley                                Last Name:  Weasley

Cust_no:  3  First Name:  Malfoy                                 Last Name:  Malfoy

Deleting Data (the Delete in curD)

Deleting a Single Row

sql = 'DELETE from customer where customer_no =9'

cursor.execute(sql)

Deleting Multiple Rows

#Delete multiple rows using ‘in’:

id_list = [1,2,3]

query_string = "delete from customer where customer_no in (%s)" % ','.join(['?'] * len(id_list))

cursor.execute(query_string, id_list)

Complete Code for Both ODBC /JDBC

Import pyodbc as pdb:

conn = pdb.connect("dsn=TestDSN;uid=vidisha;pwd=vidisha" )

#conn=  pdb.connect("driver=Ingres;servertype=ingres;server=@localhost,tcp_ip,VW;uid=vidisha;pwd=vidisha;database=test”)

# Python 2.7

conn.setdecoding(pdb.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pdb.SQL_WCHAR, encoding='utf-8')
conn.setencoding(str, encoding='utf-8')
conn.setencoding(unicode, encoding='utf-8')
conn.autocommit= True
cursor = conn.cursor()

print("DROPPING TABLE")
cursor .execute("DROP TABLE IF EXISTS customer")
conn.commit

print("nCREATING TABLE ")
cursor .execute("CREATE TABLE customer(customer_no INT NOT NULL PRIMARY KEY,last_name VARCHAR(40),first_name CHAR(40))")

print("INSERTING ROWS TO TABLE customer")
cursor.execute("INSERT INTO customer VALUES (1, 'Harry', 'Potter')")
cursor.execute("INSERT INTO customer VALUES (2, 'Ron','Weasley')")
cursor.execute("INSERT INTO customer VALUES (3, 'Draco', 'Malfoy')")

data =[

(5, 'Harry', 'Potter'),
(6, 'Ron','Weasley'),
(7, 'Draco', 'Malfoy')]

cursor.executemany("INSERT INTO customer VALUES (?,?,?)",data)

#or

data =[

(8, 'A', ),
(9, 'B',),
(10, 'C', )]

cursor.executemany("insert into customer(customer_no, first_name) VALUES(?, ?)", data)

print("DROPPING TABLE")
cursor.execute("DROP TABLE IF EXISTS test")
print("CREATING TABLE to insert many rows using executemany()")
cursor.execute("CREATE TABLE test(testid varchar(100))")
data_to_insert = [(i,) for i in range(100)]
print("Insert multiple data to test")
cursor.executemany("INSERT INTO test (testid) VALUES (?)", data_to_insert)

print("Fetching COUNT OF TABLE")

cursor.execute("select count(*) from customer")
result=cursor.fetchone()
print(result[0])

print("FETCHING MANY ROWS")

cursor.execute("select First 3 * from customer")
results=cursor.fetchall()
for row in results:
    print ("Cust_no: ",row[0]," First Name: ",row[2]," Last Name: ",row[2])

print("UPDATING SINGLE ROW")
updatedata= ('X', 'Y', 10)
sql = 'UPDATE customer SET last_name = ? , first_name=? where customer_no =? '
cursor.execute(sql, updatedata)

print("DELETING A ROW")

sql = 'DELETE from customer where customer_no =9'
cursor.execute(sql)

print("DELETING MULTIPLE ROWS USING 'IN'")
id_list = [1,2,3]
query_string = "delete from customer where customer_no in (%s)" % ','.join(['?'] * len(id_list))
cursor.execute(query_string, id_list)
#close connection
cursor.close()
conn.close()

For JDBC code is same just the connection string is different:

import jaydebeapi as jdb
conn = jdb.connect('com.ingres.jdbc.IngresDriver','jdbc:ingres://localhost:VW7/test' ,driver_args={'user': 'vidisha', 'password': 'vidisha'},jars='iijdbc.jar')
cursor = conn.cursor()

print("DROPPING TABLE")
cursor .execute("DROP TABLE IF EXISTS customer")
conn.commit

print("nCREATING TABLE ")
cursor .execute("CREATE TABLE customer(customer_no INT NOT NULL PRIMARY KEY,last_name VARCHAR(40),first_name CHAR(40))")

print("INSERTING ROWS TO TABLE customer")
cursor.execute("INSERT INTO customer VALUES (1, 'Harry', 'Potter')")
cursor.execute("INSERT INTO customer VALUES (2, 'Ron','Weasley')")
cursor.execute("INSERT INTO customer VALUES (3, 'Draco', 'Malfoy')")

Additional references

 

About Actian Corporation

Actian is helping businesses build a bridge to a data-defined future. We’re doing this by delivering scalable cloud technologies while protecting customers’ investments in existing platforms. Our patented technology has enabled us to maintain a 10-20X performance edge against competitors large and small in the mission-critical data management market. The most data-intensive enterprises in financial services, retail, telecommunications, media, healthcare and manufacturing trust Actian to solve their toughest data challenges.