Splitting a MySQL Dump File Into Smaller Files Via Perl

I was trolling the MySQL forums web site and noticed a post regarding someone who was trying to load a 50-gigabyte MySQL dump file. The author of the post had stated that the loading of the file had been running for days, and was wondering if there was a better way to import the file. They did not have access to anything else (i.e. – original database) but this file.

I have had to restore several databases in the past from a single large MySQL dump file – which led me to start backing up each database individually. These databases are for my own personal use and are not updated that often, so I don’t need to have point-in-time recovery – and so a MySQL dump works just fine. If I had a production system, I would invest in the MySQL Enterprise Backup and the MySQL Enterprise Monitor applications.

But, I have had problems when I tried to restore a relatively small MySQL instance of 21 databases with a total file size of less than 900 megabytes. So, I wrote a Perl script to simply split the large MySQL dump file into smaller files, which I could then load one at a time to isolate the problem. I am sure that there are many other ways of troubleshooting this person’s problem, but I thought I would dig out this script and post it here. This script is also handy for splitting any text file that you have into smaller files. All you would have to do is to change the keywords that causes a new file to be created.

The script opens the MySQL dump file, and starts outputting each line to a new file, looking for the keywords “– Current Database: “. When the script sees those keywords, it closes the last file and opens a new one, and then prints that line with the keywords out to the new file until the keyword appears again. The script names each new file with a sequential number – as in database_01.sql, database_02.sql, etc. The first file that this script creates (named database_00.sql) contains the header information from the MySQL dump file, and this information is included in each of the subsequent database files (thanks to Nick Miles for pointing this out – see comments).

I tested this script on an 850-megabyte MySQL dump file, and it worked perfectly (and executed in less than one minute). To make sure that your new files contain the same amount of data as the original file, you can use the Unix command “wc” to show you the size of each file, and then you can compare the sum of these sizes to the size of your MySQL dump file. I first ran the script, and it produced the following 22 files:

sh-3.2# /usr/bin/perl split_file.pl
2011-09-13 21:24:23 - Creating file database_00.sql
2011-09-13 21:24:23 - Creating file database_01.sql
2011-09-13 21:24:23 - Creating file database_02.sql
2011-09-13 21:24:23 - Creating file database_03.sql
2011-09-13 21:24:23 - Creating file database_04.sql
2011-09-13 21:24:23 - Creating file database_05.sql
2011-09-13 21:24:23 - Creating file database_06.sql
2011-09-13 21:24:23 - Creating file database_07.sql
2011-09-13 21:24:23 - Creating file database_08.sql
2011-09-13 21:24:23 - Creating file database_09.sql
2011-09-13 21:24:23 - Creating file database_10.sql
2011-09-13 21:24:23 - Creating file database_11.sql
2011-09-13 21:24:23 - Creating file database_12.sql
2011-09-13 21:24:23 - Creating file database_13.sql
2011-09-13 21:24:23 - Creating file database_14.sql
2011-09-13 21:24:24 - Creating file database_15.sql
2011-09-13 21:24:24 - Creating file database_16.sql
2011-09-13 21:24:24 - Creating file database_17.sql
2011-09-13 21:24:24 - Creating file database_18.sql
2011-09-13 21:24:24 - Creating file database_19.sql
2011-09-13 21:24:24 - Creating file database_20.sql
2011-09-13 21:25:14 - Creating file database_21.sql

Here are the sizes of the individual database*.sql files, as well as the original source file named all_data.sql:

sh-3.2# wc database*sql
sh-3.2# wc database*.sql
         18      66     807 database_00.sql
         51     175    1676 database_01.sql
        413   18379 1135180 database_02.sql
        596  101335 5281418 database_03.sql
        571   14161  367113 database_04.sql
        571   10116  264555 database_05.sql
         26      87     943 database_06.sql
         88    2849   38786 database_07.sql
        565   11121  411554 database_08.sql
        338   10954  427717 database_09.sql
         92     403    7529 database_10.sql
        300   12690  326553 database_11.sql
        273    1929   43345 database_12.sql
         62     218    2143 database_13.sql
        75   82987 13049334 database_14.sql
        755   74021  909296 database_15.sql
         62     324    3775 database_16.sql
        209  535852 8778365 database_17.sql
        147    2391   37442 database_18.sql
         26      87     946 database_19.sql
    1243 28094397 826796485 database_20.sql
        333  134098 5402798 database_21.sql
    6814 29108640 863287760 total

sh-3.2# ls -l all_data.sql
-rw-rw-rw- 1 tonydarnell admin 863270813 Sep 13 20:23 all_data.sql

As you can see, the total size of the database*.sql files is 863287760, and the size of our original MySQL dump file (all_data.sql) is 863270813. But, we have to take into account that we added the header information to each file. The header file size is 807, and we have 21 files, for a total of 16,947. If we subtract 16,947 from the total 863287760, we get 863270813, which is the same size as our original file.

After splitting the main MySQL dump file, you can now restore/import each file individually, and if you have any problems, they will be easier to troubleshoot.

Here is the script:

#!/usr/bin/perl -w

# Need to explicitly load the functions in POSIX
use POSIX 'strftime';
use File::Copy;

# file name of the file you want to split
$filename = "all_data.sql";

# Note that if you pass no argument to localtime, it assumes the current time
$time = strftime '%Y-%m-%d %H:%M:%S', localtime;

$count = 0;
$filecount = 0;

# filename for original file

# syntax for the first file name - the header file - and it will be named with 00
# if you have more than 100 files, you could modify this statement to place a leading
# zero if the file is greater than 10 but less than 100
if ($filecount < 10)

{
     $header_filename = "database_0" . $filecount . ".sql";
}

print "$time - Creating file $header_filename\n";

open(OUTPUT, ">$header_filename") || die "Can't redirect stdout";

open(LINE ,"$filename") || die ("\nERROR - could not open file: $filename\n");
while (<LINE>)

{

if ($_ =~ "^-- Current Database: " && $count > 1)

{

               if ($count eq 1)
               
               {
                    print "Closing header file.\n";
                    close (OUTPUT);
               }

                    # close the old file
                    close(OUTPUT);

                    $filecount++;

                    # syntax for new file names
                    if ($filecount < 10)
                    
                    {
                         $new_filename = "database_0" . $filecount . ".sql";
                    }
                    
                    else
                    
                    {
                     $new_filename = "database_" . $filecount . ".sql";
                    }
                    
                    $time = strftime '%Y-%m-%d %H:%M:%S', localtime;
                    print "$time - Creating file $new_filename\n";

                    # copy the header file to the next new file, so the header info is in the file
                    copy("$header_filename","$new_filename") or die "Copy failed: $!";

                    open(OUTPUT, ">>$new_filename") || die "Can't redirect stdout";
                    print OUTPUT $_;

}

else

{
          # print information into the header file
                    print OUTPUT $_;
}

$count++;

# end while
}

close (LINE);
close (OUTPUT);

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.
About these ads

4 Responses to Splitting a MySQL Dump File Into Smaller Files Via Perl

  1. Twirrim says:

    I’ve been meaning to write something like that but never got around to it. Just for fun, try this:

    $filesize = -s $filename;
    $fileperc = 100/$filesize;
    $sumsize = 0;

    $sumsize+= -s OUTPUT;
    printf (“Completed ~%.3f%%\n”,($fileperc*$sumsize));

    That uses the script but provides you with a rough %age completed after each split file, based on summing the byte file size.

  2. kedar says:

    Good work… Also have a look at mysqldumpsplitter [http://kedar.nitty-witty.com/blog/mydumpsplitter-extract-tables-from-mysql-dump-shell-script] I had written it long back !!

  3. Nick Miles says:

    I just wrote something similar. One issue I see in your script, each db dump file will need the “header” and “footer” SET lines from the entire mysqldump file. That is, the SET commands that turn off foreign keys, etc in the beginning, and then turn them back on at the end. Otherwise, your individual db imports may fail due to constraints failing.

    • Tony Darnell says:

      Nick – thanks for keeping me honest. This is what happens when you write a blog late at night. I have since modified the script to add the header information to each of the subsequent database*.sql files. I am assuming that one will be importing all of the files in order, so in the individual database*.sql files, I didn’t include the footer lines at the bottom of the original file, as those will be in the last file in the series. Thanks, Tony

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 40 other followers