Defindit Docs and Howto Home

This page last modified: Oct 09 2009
title:SQLite from MySQL and AmiGO Gene Ontology
keywords:mysql,gene,ontology,go,amigo,sqlite,sqlite3,sql,database,dbi,dbd,perl,postgres,postgresql,install,upgrade,db,rdbms,feature,advanced,export,table,conversion,unix
description:A Perl script and instructions to convert MySQL to SQlite using the AmiGO Gene Ontology database as an example.



Table of contents
-----------------
Introduction
Download and run
Yet another SQLite, Postgres, MySQL overview



Introduction
------------

I loaded geneontology.org's AmiGO database into SQLite. I like the
speed, and simplicity of working with SQLite. It works well, is easy
to use, is SQL standards compliant, and fast. When I need a big-guns
client-server database I use PostgreSQL.

The resulting database created below will work with SQLite on Windows
or Linux. The supplied commands are Linux. There is an SQLite command
line interface, however, the queries against AmiGO tend to be a trifle
complex and are better created by a web front ended, Perl scripted
CGI back end.



Download and run
----------------

1) Download the MySQL dump SQL statements from geneontology.org. The
README is out of date. The full db with gene products is
go_YYYYMMDD-seqdblite-data.gz. (They refer to seqdb which no longer
exists presumably due to performance issues. They suggest
seqdblite. Their old naming convention was
go-YYYMMDD-seqdblite.sql. Instead of ".sql", they seem to have changed
to "-data.gz".)

# This seems to be the most complete download:
http://archive.geneontology.org/latest-lite/go_20090823-seqdblite-data.gz

# The (dated?) readme:
http://archive.geneontology.org/latest-lite/README

Download my Perl script that converts MySQL to SQLite. It is based on
work I did previously to convert MySQL to PostgreSQL, as well as a
shell script you can find at sqlite.org. It may work with any MySQL,
but has only been tested with the AmiGO MySQL dump.

I provide the script as a .tar for various practical reasons.

http://defindit.com/readme_files/mysql2sqlite.tar


wget http://archive.geneontology.org/latest-lite/go_20090823-seqdblite-data.gz
wget http://defindit.com/readme_files/mysql2sqlite.tar
tar -xvf mysql2sqlite.tar
chmod +x mysql2sqlite/mysql2sqlite.pl
gunzip go_20090823-seqdblite-data.gz
cat go_20090823-seqdblite-data.gz | mysql2sqlite/mysql2sqlite.pl > go_20090823-seqdblite-data.sqlite
cat go_20090823-seqdblite-data.sqlite | sqlite3 go_seqdblite.db


All those commands should run without errors or warnings. My Perl
script is a trifle naive, but worked on the Aug 23 2009 GO database.

SQLite performance may be fine without indexes, but I won't be
surprised if indexes are necessary for more complex queries.



Yet another SQLite, Postgres, MySQL overview
--------------------------------------------

I wrote this for a non-technical friend. 

SQLite is a fully transactional SQL relational database management
system (RDBMS). SQLite is a single executable and there is no
client/server which means that SQLite is simple. Each database is a
single file. The database files are portable across all
platforms. SQLite is considered an embedded SQL engine. It is so small
that it can be embedded inside other applications, or in ROM on
devices. For example, SQLite is often used on cell phones. It is fast
and standard. Being a single executable and a single database file,
administration is nil. Install and run. SQLite is fully transactional,
and still as fast as or faster than other SQL databases.

SQLite does not use client-server architecture, so in a web
application the database must be on the server with the web scripts
(although it is best to put the database in a non-web accessible
directory). It can easily handle loads that 90% of web sites will
experience, but will not scale to large loads. No authentication is
necessary or possible, but in a single web server mode this is a
non-issue. SQLite is not a daemon, so the security concerns are the
same as with any code called from CGI scripts.

Oracle, PostgreSQL, and MySQL are all client server RDBMSs. They
perform well with really massive numbers of transactions per
second. (MySQL is often set up non-transactional; for instance
geneontology.or uses the non-transactional MyISAM tables.)
Client-server RDBMSs are tricky to install and administer. I
have documents to streamline this process for PostgreSQL at:

http://defindit.com/readme_files/postgres_utilities.html

Cient-server RDBMSs must take into account security issues and
authentication. (Postgres and MySQL can be set up without database
passwords, but would still need special configuration to be secure.)
They are also good when you want authentication. It is possible to
hack the database server without necessarily hacking the host server,
so the same security precautions must be taken with the database
server (daemon) as with any other daemon or service on the server.