This page last modified: Oct 09 2009
title:miRBase mirna converted to SQLite keywords:mirbase,mirna,data,dump,mysql,sqlite,sqlite3,sql,database,perl,install,db,rdbms,feature,advanced,export,table,conversion description:A Perl script and instructions to convert the miRBase schema definition file from MySQL to SQLite and load the data. SQLite performance may be fine without indexes, but I won't be surprised if indexes are necessary for more complex queries. Download all the mirbase files from their ftp site. See: http://mirbase.org/ftp.shtml ftp://mirbase.org/pub/mirbase/CURRENT/ The mirna data is provided in tab-separated text data files, with a single schema dump of the SQL tables in MySQL format. Download my Perl script that converts MySQL to SQLite. It is based on work I did previously to convert MySQL to PostgreSQL, as well as a shell script you can find at sqlite.org. It may work with any MySQL, but has only been tested with the miRBase schema definition and the AmiGO MySQL dump. I provide the script as a .tar for various practical reasons. http://defindit.com/readme_files/mysql2sqlite.tar # Get the tar file with my script to create SQLite from MySQL # untar, chmod as necessary. wget http://defindit.com/readme_files/mysql2sqlite.tar tar -xvf mysql2sqlite.tar chmod +x mysql2sqlite/mysql2sqlite.pl # Doesn't everyone keep their bioinformatics data in /bioinfo? Make a # mirbase dir and some subdirectories. cd /bioinfo mkdir mirbase cd mirbase mkdir database_files mkdir genomes # cd into the database_files subdir, download all the files, gunzip, # cat the MySQL to the Perl script converting to SQLite, create a file # with some SQLite load commands, run sqlite3, read in the schema # file, read in the data loader commands. cd database_files wget "ftp://mirbase.org/pub/mirbase/CURRENT/database_files/*" gunzip *.gz cat tables.sql | ~/public_html/cowpea/mysql2sqlite.pl > tables_sqlite.sql # Copy and paste the following text into load_cmd.txt. I created mine # by runing an Emacs keyboard macro on "ls *.txt > load_cmd.txt", and # then adding the .separator command at the top. .separator \t .import dead_mirna.txt dead_mirna .import literature_references.txt literature_references .import mirna_2_prefam.txt mirna_2_prefam .import mirna_chromosome_build.txt mirna_chromosome_build .import mirna_context.txt mirna_context .import mirna_database_links.txt mirna_database_links .import mirna_literature_references.txt mirna_literature_references .import mirna_mature.txt mirna_mature .import mirna_prefam.txt mirna_prefam .import mirna_pre_mature.txt mirna_pre_mature .import mirna_species.txt mirna_species .import mirna_target_links.txt mirna_target_links .import mirna_target_url.txt mirna_target_url .import mirna.txt mirna sqlite3 mirna.db .read tables_sqlite.sql .read load_cmd.txt # Below are some commands that sort of verify that the data loaded. I # have a two line shell prompt (zsh), which looks a little unusual. > sqlite3 mirna.db -- Loading resources from /home/mst3k/.sqliterc SQLite version 3.6.12 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> select count(*) from mirna; count(*) ---------- 10883 sqlite> sqlite> .e --- mst3k@anubis:pts/1 03:21 /bioinfo/mirbase/database_files --- > wc -l mirna.txt 10883 mirna.txt --- mst3k@anubis:pts/1 03:21 /bioinfo/mirbase/database_files --- > # cd to the other directories and download data. cd ../genomes wget "ftp://mirbase.org/pub/mirbase/CURRENT/genomes/*" cd ../ wget "ftp://mirbase.org/pub/mirbase/CURRENT/*.gz" wget "ftp://mirbase.org/pub/mirbase/CURRENT/*.xls" wget "ftp://mirbase.org/pub/mirbase/CURRENT/LI*" wget "ftp://mirbase.org/pub/mirbase/CURRENT/REA*" wget "ftp://mirbase.org/pub/mirbase/CURRENT/THI*" wget "ftp://mirbase.org/pub/mirbase/CURRENT/*.txt"