Splitting massive MySQL dumps

As I posted yesterday, I have a massive MySQL dump to import. I tried BigDump, but one of the tables kept producing errors and so BigDump would exit. I don't need the whole db imported, so I wrote this to split it by table. It produces a new sql file for every table it finds, numbered sequentially so if you process them in alphabetical order it's the equivalent of the whole dump. USE statements get their own files in the same sequence.

  1. #! /usr/bin/perl
  3. use strict;
  4. use warnings;
  5. use 5.010;
  7. my <span class="katex math inline">dump_file =</span>ARGV[0];
  8. &usage() if !<span class="katex math inline">dump_file;
  10. say "using ".</span>dump_file;
  12. my (<span class="katex math inline">line,</span>table,@query, <span class="katex math inline">file_number,</span>file_name);
  13. my <span class="katex math inline">line_number = 1;
  14. my</span>find_count = 0;
  16. open(DUMP_IN, "< $dump_file");
  17. while(<DUMP_IN>){
  18. my <span class="katex math inline">line =</span>_;
  19. if (/^USE\s.(\w+)./){
  20. say "changing db: ".<span class="katex math inline">1;</span>file_name = &make_file_name("USE_<span class="katex math inline">1", "</span>find_count");
  21. &write_USE(<span class="katex math inline">file_name,</span>line);
  22. <span class="katex math inline">find_count++;
  23. }elsif (/^-- Table structure for table .(.+)./){
  24. ## If the current line is the beginning of a table definition
  25. ## and @query is defined, then @query must be full of the previous
  26. ## table, so we want to process it now:
  27. if (@query){</span>file_name = &make_file_name("<span class="katex math inline">table", "</span>find_count");
  28. open(OUTPUT, "><span class="katex math inline">file_name");
  29. foreach(@query){
  30. print OUTPUT</span>_;
  31. }
  32. close OUTPUT;
  33. undef @query;
  34. }
  35. <span class="katex math inline">table =</span>1;
  36. <span class="katex math inline">find_count++;
  37. }
  38. next unless</span>table;
  39. push @query, <span class="katex math inline">line;</span>line_number++;
  40. }
  41. close DUMP_IN;
  42. say <span class="katex math inline">line_number;
  44. ## Subroutines!
  45. sub write_USE() {
  46. my(</span>filename, <span class="katex math inline">line) = @_[0,1];
  47. open (OUTPUT, "></span>filename");
  48. print OUTPUT <span class="katex math inline">line;
  49. close OUTPUT;
  50. }
  52. sub make_file_name() {
  53. my (</span>type, <span class="katex math inline">number) = @_[0,1];</span>number = sprintf("%05d", <span class="katex math inline">number);</span>file_name=<span class="katex math inline">number."_".</span>type.".sql";
  54. return <span class="katex math inline">file_name;
  55. }
  57. sub usage() {
  58. say "Error: missing arguments.";
  59. say "Usage:";
  60. say "</span>0 [MYSQL_DUMP]";
  61. exit 1;
  62. }

A small downside is that this replaces my 2.5Gb file with about 1800 smaller ones. A scripted importer is to follow.

Massive dumps with MySQL

hurr. *insert FLUSH TABLES joke here*

I have a 2.5GB sql dump to import to my MySQL server. MySQL doesn't like me giving it work to do, and the box it's running on only has 3GB of memory. So, I stumbled across bigdump, which is brilliant. It's a PHP script that splits massive SQL dumps into smaller statements, and runs them one at a time against the server. Always the way: 10 lines into duct-taping together something to do the job for you, you find that someone else has done it rather elegantly.1

In short, we extract the directory to a publicly http-accessible location, stick the sql dump there and tell it to go.

In long, installation is approximately as follows:

  1. [email protected]:~<span class="katex math inline">cd www
  2. [email protected]:~/www</span> mkdir bigdump
  3. [email protected]:~/www<span class="katex math inline">chmod 777 bigdump
  4. [email protected]:~/www</span> cd bigdump/
  5. [email protected]:~/www<span class="katex math inline">wget -q http://www.ozerov.de/bigdump.zip
  6. [email protected]:~/www</span> unzip bigdump.zip
  7. [email protected]:~/www/bigdump$ ls
  8. bigdump.php bigdump.zip

Where ~/www is my apache UserDir (i.e. when I visit http://localhost/~avi, i see the contents of ~/www). We need permissions to execute PHP scripts in this dir, too (which I have already). We also need to give everyone permissions to do everything - don't do this on the internet!2

Configuration involves editing bigdump.php with the hostname of our MySQL server, the name of the DB we want to manipulate and our credentials. The following is lines 40-45 of mine:

  1. // Database configuration
  3. <span class="katex math inline">db_server = 'localhost';</span>db_name = 'KBDB';
  4. <span class="katex math inline">db_username = 'kbox';</span>db_password = 'imnottellingyou';

Finally, we need to give it a dump to process. For dumps of less than 2Mb3, we can upload through the web browser, else we need to upload or link our sql dump to the same directory as bigdump:

  1. [email protected]:~/www/bigdump$ ln -s /home/avi/kbox/kbox_dbdata ./dump.sql

Now, we visit the php page through a web browser, and get a pretty interface:

BigDump lists all the files in its working directory, and for any that are SQL dumps provides a 'Start Import' link. To import one of them, click the link and wait.

  1. Yes, you Perl people, it's in PHP. But it's not written by me. So on balance turns out more elegant. []
  2. Those permissions aside - anyone can execute whatever SQL they like with your credentials through this page. Seriously, not on the internet! []
  3. Or whatever's smaller out of upload_max_filesize and post_max_size in your php.ini []