Checking on the Progress of Large DML Commands in MySQL Using Perl – Part One

Part One of Two: Checking on database activity when running a large DML (Data Manipulation Language) statement – such as INSERT, DELETE, UPDATE or SELECT.

Part One: Inserting a million rows into a database.

A friend of mine had asked a question – “Is there any way you can track how far you have advanced in a slow-moving ALTER or OPTIMIZE statement?”. A customer was performing some modifications on a database with tens of millions of rows, and they wanted to be able to see if the command was making any progress.

Since the customer was using the InnoDB storage engine, I thought of a way that you could check on the progress – but only given the fact that nothing else (major) was happening in the database (more on this reason later).

With InnoDB, you can issue the SHOW ENGINE INNODB STATUS (part of the InnoDB Monitors) and you will get a wealth of information. I am not going to list any of the output from the command here as it would consume too much space. However, towards the bottom of the output from the command, you will notice under “ROW OPERATIONS” a line that contains the words “Number of rows inserted…”:

--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread id 2960240640, state: waiting for server activity
Number of rows inserted 1202598, updated 97249, deleted 806, read 56448551
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

The above line starting with “Number of rows…” gives you a quick snapshot of how many rows have been inserted, updated, deleted or selected (read) up to that point in time. So, by invoking the SHOW ENGINE INNODB STATUS command, you can figure what database activity has occurred since the last time you issued the command. But, if you invoke this command while the database is being used for other purposes, it will be difficult to figure out your progress on any of the four values shown, as the stats aren’t just for your activity.

So, all I needed to do was to write a quick Perl script to monitor that particular line of output, and then I could figure out if a certain statement was indeed being executed, and I could monitor the progress (assuming nothing else was happening on the database). The problem that I had was that I didn’t have a database with a lot of records in it. But, I did have a database server that didn’t have much activity. So, in part one, I will show you how to create a dummy table and then shove a million records into it. In part two (my next blog entry), I will show you the Perl script to monitor the changes to the output from the SHOW ENGINE INNODB STATUS command. (And yes, you can just run the SHOW ENGINE INNODB STATUS command manually over and over again, but what fun would that be?)

First I created a dummy table named “test_large”:

CREATE TABLE `test_large` (
`id` int(9) NOT NULL AUTO_INCREMENT,
`name_first` varchar(30) NOT NULL,
`name_last` varchar(30) NOT NULL,
`address_01` varchar(50) NOT NULL,
`address_02` varchar(50) NOT NULL,
`city` varchar(20) NOT NULL,
`state` varchar(2) NOT NULL,
`zip` varchar(5) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1

Next, I created a Perl script that would create dummy/fake data and insert it into the table. This script uses a subroutine named “ConnectToMySql” that I covered in an earlier post – Connecting to MySQL with Perl. It also uses a subroutine to create fake alpha/numeric data (generate_random_string) and one that creates fake numeric data (generate_random_numbers). A third subroutine (get_date_time) is used to display the date and time (I use this subroutine when I want to insert a date and time into a MySQL database).

The script is fairly easy to use. You only need to modify the $total_rows variable to the number of rows that you want to insert, as well as the information about your particular database. I didn’t include anything in the script to show you the progress of this script, but you could certainly add that if you wish.

#!/usr/bin/perl -w

# DBI is the standard database interface for Perl
# DBD is the Perl module that we use to connect to the MySQL database
use DBI;
use DBD::mysql;

# how many total dummy rows of data do you need?
# (must be an integer)
$total_rows = 10000000;

# make sure that $total_rows is an integer
if ($total_rows =~ /\D/) {
     print "Total rows is not an integer = value = $total_rows.\n";
     exit;
}

# keep count of how many rows we have inserted
$total_completed = 0;

# your database name
$Database = "scripting_mysql";

# get the current system date and time
$print_date_time = &get_date_time;

# print the starting time (optional)
print "Starting $print_date_time\n";

# keep looping until we reach the total number of $total_rows
while ($total_completed < $total_rows) {

     # create dummy data
     $fake_data_01 = &generate_random_string(30);
     $fake_data_02 = &generate_random_string(30);
     $fake_data_03 = &generate_random_string(50);
     $fake_data_04 = &generate_random_string(50);
     $fake_data_05 = &generate_random_string(20);
     $fake_data_06 = &generate_random_string(2);
     $fake_data_07 = &generate_random_numbers(5);

     $dbh = ConnectToMySql($Database);

     $query = "insert into test_large (name_first, name_last, address_01, address_02, city, state, zip) values (?,?,?,?,?,?,?)";
     
     $sth = $dbh->prepare($query);

     $sth->execute($fake_data_01, $fake_data_02, $fake_data_03, $fake_data_04, $fake_data_05, $fake_data_06, $fake_data_07);
     $sth->finish;
          
     $total_completed++;
}

     $dbh->disconnect;

$print_date_time = &get_date_time;

# print our finishing time
print "Finishing $print_date_time\n";
exit;

# ----------------------------------------------------------------------------------
# subroutines
# ----------------------------------------------------------------------------------

# ----------------------------------------------------------------------------------
sub get_date_time {
# ----------------------------------------------------------------------------------

     my ($sec,$min,$hour,$mday,$mon,$year) = localtime time;

     $year = $year + 1900;
     $mon = $mon + 1;

     # add a zero if the value is less than 10

     if ($sec < 10) { $sec = "0$sec"; }
     if ($min < 10) { $min = "0$min"; }
     if ($hour < 10) { $hour = "0$hour"; }
     if ($mday < 10) { $mday = "0$mday"; }
     if ($mon < 10) { $mon = "0$mon"; }
     if ($year < 10) { $year = "0$year"; }

     $DateTime = "$year-$mon-$mday $hour:$min:$sec";
     return $DateTime

}
#----------------------------------------------------------------------

# from Connecting to MySQL with Perl
#----------------------------------------------------------------------
sub ConnectToMySql {
#----------------------------------------------------------------------

     my ($db) = @_;

     open(ACCESS_INFO, "<..\/accessBLOG") || die "Can't access login credentials";
     my $database = <ACCESS_INFO>;
     my $host = <ACCESS_INFO>;
     my $userid = <ACCESS_INFO>;
     my $passwd = <ACCESS_INFO>;

# the chomp() function will remove any newline character from the end of a string
chomp ($database, $host, $userid, $passwd);
#print "<br>$database $host $userid $passwd <br>";

     my $connectionInfo="dbi:mysql:$database;$host";
     close(ACCESS_INFO);

     # make connection to database
     my $l_dbh = DBI->connect($connectionInfo,$userid,$passwd);
     return $l_dbh;

}
#----------------------------------------------------------------------

#----------------------------------------------------------------------
sub generate_random_string {
#----------------------------------------------------------------------

     my $length_of_randomstring=shift;# the length of
                # the random string to generate

     my @chars=('a'..'z','A'..'Z','0'..'9','_');
     my $random_string;
     foreach (1..$length_of_randomstring)
     {
          # rand @chars will generate a random
          # number between 0 and scalar @chars
          $random_string.=$chars[rand @chars];
     }
     return $random_string;
}
#----------------------------------------------------------------------

#----------------------------------------------------------------------
sub generate_random_numbers {
#----------------------------------------------------------------------

     my $length_of_randomstring=shift;# the length of
                # the random string to generate

     my @chars=('0'..'9');
     my $random_string;
     foreach (1..$length_of_randomstring)
     {
          # rand @chars will generate a random
          # number between 0 and scalar @chars
          $random_string.=$chars[rand @chars];
     }
     return $random_string;
}
#----------------------------------------------------------------------

This script took about three hours to insert a million rows into the new database. And since this was a new table, every so often I would just check to see how many rows were in the table with a simple select command:

mysql> select count(*) from test_large;
+----------+
| count(*) |
+----------+
| 98187 |
+----------+
1 row in set (0.00 sec)

In my next post, I will show you the Perl script that I used to monitor the SHOW ENGINE INNODB STATUS results.

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.
Tony is the author of Twenty Forty-Four: The League of Patriots

 

Visit http://2044thebook.com for more information.

Leave a comment