miRBase mirna converted to SQLite

Defindit Docs and Howto Home

This page last modified: Oct 09 2009
title:miRBase mirna converted to SQLite
keywords:mirbase,mirna,data,dump,mysql,sqlite,sqlite3,sql,database,perl,install,db,rdbms,feature,advanced,export,table,conversion
description:A Perl script and instructions to convert the miRBase schema definition file from MySQL to SQLite and load the data.


SQLite performance may be fine without indexes, but I won't be
surprised if indexes are necessary for more complex queries.

Download all the mirbase files from their ftp site. See:
http://mirbase.org/ftp.shtml

ftp://mirbase.org/pub/mirbase/CURRENT/


The mirna data is provided in tab-separated text data files, with a
single schema dump of the SQL tables in MySQL format.

Download my Perl script that converts MySQL to SQLite. It is based on
work I did previously to convert MySQL to PostgreSQL, as well as a
shell script you can find at sqlite.org. It may work with any MySQL,
but has only been tested with the miRBase schema definition and the
AmiGO MySQL dump.

I provide the script as a .tar for various practical reasons.

http://defindit.com/readme_files/mysql2sqlite.tar


# Get the tar file with my script to create SQLite from MySQL
# untar, chmod as necessary.

wget http://defindit.com/readme_files/mysql2sqlite.tar
tar -xvf mysql2sqlite.tar
chmod +x mysql2sqlite/mysql2sqlite.pl

# Doesn't everyone keep their bioinformatics data in /bioinfo? Make a
# mirbase dir and some subdirectories.

cd /bioinfo
mkdir mirbase
cd mirbase
mkdir database_files
mkdir genomes

# cd into the database_files subdir, download all the files, gunzip,
# cat the MySQL to the Perl script converting to SQLite, create a file
# with some SQLite load commands, run sqlite3, read in the schema
# file, read in the data loader commands.

cd database_files
wget "ftp://mirbase.org/pub/mirbase/CURRENT/database_files/*"
gunzip *.gz
cat tables.sql | ~/public_html/cowpea/mysql2sqlite.pl > tables_sqlite.sql 

# Copy and paste the following text into load_cmd.txt. I created mine
# by runing an Emacs keyboard macro on "ls *.txt > load_cmd.txt", and
# then adding the .separator command at the top.

.separator \t
.import dead_mirna.txt dead_mirna
.import literature_references.txt literature_references
.import mirna_2_prefam.txt mirna_2_prefam
.import mirna_chromosome_build.txt mirna_chromosome_build
.import mirna_context.txt mirna_context
.import mirna_database_links.txt mirna_database_links
.import mirna_literature_references.txt mirna_literature_references
.import mirna_mature.txt mirna_mature
.import mirna_prefam.txt mirna_prefam
.import mirna_pre_mature.txt mirna_pre_mature
.import mirna_species.txt mirna_species
.import mirna_target_links.txt mirna_target_links
.import mirna_target_url.txt mirna_target_url
.import mirna.txt mirna

sqlite3 mirna.db
.read tables_sqlite.sql
.read load_cmd.txt

# Below are some commands that sort of verify that the data loaded. I
# have a two line shell prompt (zsh), which looks a little unusual.

> sqlite3 mirna.db 
-- Loading resources from /home/mst3k/.sqliterc
SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select count(*) from mirna;
count(*)  
----------
10883     
sqlite>
sqlite> .e
--- mst3k@anubis:pts/1 03:21 /bioinfo/mirbase/database_files ---
> wc -l mirna.txt
10883 mirna.txt
--- mst3k@anubis:pts/1 03:21 /bioinfo/mirbase/database_files ---
>

# cd to the other directories and download data.

cd ../genomes
wget "ftp://mirbase.org/pub/mirbase/CURRENT/genomes/*"
cd ../
wget "ftp://mirbase.org/pub/mirbase/CURRENT/*.gz"
wget "ftp://mirbase.org/pub/mirbase/CURRENT/*.xls"
wget "ftp://mirbase.org/pub/mirbase/CURRENT/LI*"
wget "ftp://mirbase.org/pub/mirbase/CURRENT/REA*"
wget "ftp://mirbase.org/pub/mirbase/CURRENT/THI*"
wget "ftp://mirbase.org/pub/mirbase/CURRENT/*.txt"