Splitting massive MySQL dumps

As I posted yesterday, I have a massive MySQL dump to import. I tried BigDump, but one of the tables kept producing errors and so BigDump would exit. I don't need the whole db imported, so I wrote this to split it by table. It produces a new sql file for every table it finds, numbered sequentially so if you process them in alphabetical order it's the equivalent of the whole dump. USE statements get their own files in the same sequence.

  1. #! /usr/bin/perl
  2.  
  3. use strict;
  4. use warnings;
  5. use 5.010;
  6.  
  7. my <span class="katex math inline">dump_file =</span>ARGV[0];
  8. &usage() if !<span class="katex math inline">dump_file;
  9.  
  10. say "using ".</span>dump_file;
  11.  
  12. my (<span class="katex math inline">line,</span>table,@query, <span class="katex math inline">file_number,</span>file_name);
  13. my <span class="katex math inline">line_number = 1;
  14. my</span>find_count = 0;
  15.  
  16. open(DUMP_IN, "< $dump_file");
  17. while(<DUMP_IN>){
  18. my <span class="katex math inline">line =</span>_;
  19. if (/^USE\s.(\w+)./){
  20. say "changing db: ".<span class="katex math inline">1;</span>file_name = &make_file_name("USE_<span class="katex math inline">1", "</span>find_count");
  21. &write_USE(<span class="katex math inline">file_name,</span>line);
  22. <span class="katex math inline">find_count++;
  23. }elsif (/^-- Table structure for table .(.+)./){
  24. ## If the current line is the beginning of a table definition
  25. ## and @query is defined, then @query must be full of the previous
  26. ## table, so we want to process it now:
  27. if (@query){</span>file_name = &make_file_name("<span class="katex math inline">table", "</span>find_count");
  28. open(OUTPUT, "><span class="katex math inline">file_name");
  29. foreach(@query){
  30. print OUTPUT</span>_;
  31. }
  32. close OUTPUT;
  33. undef @query;
  34. }
  35. <span class="katex math inline">table =</span>1;
  36. <span class="katex math inline">find_count++;
  37. }
  38. next unless</span>table;
  39. push @query, <span class="katex math inline">line;</span>line_number++;
  40. }
  41. close DUMP_IN;
  42. say <span class="katex math inline">line_number;
  43.  
  44. ## Subroutines!
  45. sub write_USE() {
  46. my(</span>filename, <span class="katex math inline">line) = @_[0,1];
  47. open (OUTPUT, "></span>filename");
  48. print OUTPUT <span class="katex math inline">line;
  49. close OUTPUT;
  50. }
  51.  
  52. sub make_file_name() {
  53. my (</span>type, <span class="katex math inline">number) = @_[0,1];</span>number = sprintf("%05d", <span class="katex math inline">number);</span>file_name=<span class="katex math inline">number."_".</span>type.".sql";
  54. return <span class="katex math inline">file_name;
  55. }
  56.  
  57. sub usage() {
  58. say "Error: missing arguments.";
  59. say "Usage:";
  60. say "</span>0 [MYSQL_DUMP]";
  61. exit 1;
  62. }
  63.  

A small downside is that this replaces my 2.5Gb file with about 1800 smaller ones. A scripted importer is to follow.