Using a trigger to parse non-conforming data in MySQL

In another post, I explained how to use the MySQL Query Re-write Plugin to manipulate data that didn’t exactly match SQL standards of MySQL. In this post, I am going to give you another example on how to use a trigger to parse non-conforming data so the data can be written to a MySQL database.

A customer came to me with a problem. They were using third-party software which produced multiple rows of comma-separated data (like a .csv file), but all of the data wasn’t on a single row. Instead, the data consisted of multiple rows of data, separated by commas and with line feeds after each row of data. Here is a shortened example of the data (the original data was 32-lines long):

Header information which may, or may not, contain one or more commas
Item Name,Type,Description,Value
Potato Chips - Sealed Bag,,,
Manifest Number,Number,MANIFEST_NUMBER,8480014
Manufacturer Code,Number,BRAND_CODE,80375993
Information 1,Number,INFO_ONE,15869563
Information 2,Number,INFO_TWO,6569569665
Information 3,Number,INFO_THREE,562
Planned Start Date,String,PROD_START_DATE,9/13/2018
Planned Start Time,String,PROD_START_TIME,081234

For some reason, the output couldn’t be separated, but the software could use an ODBC connection. They needed to be able to insert this data into a MySQL database, but they only needed certain values – they didn’t need a lot of the descriptive information. Of the example above, they only wanted to insert the information in yellow:

“Header information which may, or may not, contain one or more commas.”
Item Name Type Description Value
Potato Chips – Sealed Bag
Manifest Number Number MANIFEST_NUMBER 8480014
Manufacturer Code Number BRAND_CODE 80375993
Information 1 Number INFO_ONE 15869563
Information 2 Number INFO_TWO 6569569665
Information 3 Number INFO_THREE 562
Planned Start Date String PROD_START_DATE 9/13/2018
Planned Start Time String PROD_START_TIME 081234

At first, I suggested using a script (Python, Perl, PHP, etc.) to parse the data before inserting into the database. But, the software which produced the data had the ability to insert the data directly into a MySQL database – but all of the data could only be inserted into one field. They didn’t have an option to add an extra step outside of the database connection – and they were new to MySQL, so they needed some help.

I decided to write a trigger for the table, in order to parse the data as it came into the database, but before it was inserted into a row. I also wanted to keep the original data in a column as well.

The first step was to create a database and a table. I mapped the column names to the first value of each row, skipping the header information which wasn’t needed. The column names are in blue, and the data is in yellow:

“Header information which may, or may not, contain one or more commas.”
Item Name Type Description Value
Potato Chips – Sealed Bag
Manifest Number Number MANIFEST_NUMBER 8480014
Manufacturer Code Number BRAND_CODE 80375993
Information 1 Number INFO_ONE 15869563
Information 2 Number INFO_TWO 6569569665
Information 3 Number INFO_THREE 562
Planned Start Date String PROD_START_DATE 9/13/2018
Planned Start Time String PROD_START_TIME 081234

I can then match the columns I want to the data values:

“Header information which may, or may not, contain one or more commas.”
Item Name Potato Chips – Sealed Bag
Manifest Number 8480014
Manufacturer Code 80375993
Information 1 15869563
Information 2 6569569665
Information 3 562
Planned Start Date 9/13/2018
Planned Start Time 081234

Now, I can create the database and the table. I substituted the spaces in the field names with underscores (_). I also added a primary key column (id_MANIFEST_DATA) and a proper DATETIME column – as I want to combine the two columns PROD_START_DATE and PROD_START_TIME into one column to match MySQL’s DATETIME format. This way, they can perform a search on this column later (if necessary).

mysql> create database MANIFEST_DATA_DB;
Query OK, 1 row affected (0.00 sec)

mysql> use MANIFEST_DATA_DB;
Database changed

mysql> CREATE TABLE `MANIFEST_DATA_DB`.`MANIFEST_DATA_TABLE` (
    ->   `id_MANIFEST_DATA` INT NOT NULL AUTO_INCREMENT,
    ->   `MANIFEST_DATA` VARCHAR(4096) NULL,
    ->   `ITEM_NAME` VARCHAR(1024) NULL,
    ->   `MANIFEST_NUMBER` INT NULL,
    ->   `MANUFACTURER_CODE` VARCHAR(1024) NULL,
    ->   `INFO_ONE` CHAR(32) NULL,
    ->   `INFO_TWO` CHAR(32) NULL,
    ->   `INFO_THREE` CHAR(32) NULL,
    ->   `PROD_START_DATE` CHAR(10) NULL,
    ->   `PROD_START_TIME` CHAR(6) NULL,
    ->   `PROD_TIMESTAMP` DATETIME NULL,
    ->   PRIMARY KEY (`id_MANIFEST_DATA`))
    -> AUTO_INCREMENT = 1000000;
Query OK, 0 rows affected (0.00 sec)

The initial “non-conforming data” will be inserted into the MANIFEST_DATA field, so all I have to do is to create a trigger to parse this field before the data is inserted.

Even though the data is on separate lines, parsing this data will be relatively easy, since the data is comma-delimited. I can use the SUBSTRING_INDEX function to create an array to store all of the data. But, since the first line of the data may or may not contain a comma, instead of counting commas from the beginning of the data, I will start at the end. Also, in this example, they don’t need the first line of data, as it is header information.

Let’s take a look at why I want to count backwards. Here are three rows of data – where the first column may or may not contain a comma – or it might contain two or more commas. I really only want to capture the last two columns of data.

"Hello, it's me",12345,ABCDE
"Hello it's me",67890,FGHIJ
"Hello, yes, it's me",112233,A1B2C3

If I parse the data based upon commas and start at the beginning, I will get different results when counting commas from the beginning of the data when the first line of data contains a comma. And I only want the data in green:

1 2 3 4 5
“Hello it’s me” 67890 FGHIJ
“Hello it’s me” 12345 ABCDE
“Hello yes it’s me” 112233 A1B2C3

But if I count backwards, I will get the same result set regardless of how many commas are in the first line of data:

-5 -4 -3 -2 -1
“Hello it’s me” 67890 FGHIJ
“Hello it’s me” 12345 ABCDE
“Hello yes it’s me” 112233 A1B2C3

In the actual data I want to sue, I don’t want to store the first row of data anyway – so it can be ignored. If I did need the first line, I would have to search for the quotes and parse that column separately.

Since the initial data will contain a line feed, I will want to replace the line feeds with a comma, so I can have all of the data on a single line and be able to use the SUBSTRING_INDEX function. Here is the original data again, but this time, I have added a column for the line feeds. Also, I am going to count backwards from the last data value. The numbers are in (bold):

/tr>

(-37)“Header information which may, or may not, contain one or more commas.”
(-36) Item Name (-35) Type (-34) Description (-33) Value -line feed-
(-32) Potato Chips – Sealed Bag (-31) (-30) (-29) -line feed-
(-28) Manifest Number (-27) Number (-26) MANIFEST_NUMBER (-25) 8480014 -line feed-
(-24) Manufacturer Code (-23) Number (-22) BRAND_CODE (-21) 80375993 -line feed-
(-20) Information 1 (-19) Number (-18) INFO_ONE (-17) 15869563 -line feed-
(-16) Information 2 (-15) Number (-14) INFO_TWO (-13) 6569569665 -line feed-
(-12) Information 3 (-11) Number (-10) INFO_THREE (-9) 562 -line feed-
(-8) Planned Start Date (-7) String (-6) PROD_START_DATE (-5) 9/13/2018 -line feed-
(-4) Planned Start Time (-3) String (-2) PROD_START_TIME (-1) 081234

Of course, you don’t have to map out the fields like I did – you can just count backwards.

The SUBSTRING_INDEX function works similar to grabbing a value out of an array – except the count value is where you want to stop grabbing data. You specify the string you want to parse, the delimiter you want to use, and the count of the value from the string where you want to stop grabbing data.

The format is:

SUBSTRING_INDEX(string,delimiter,count) 

Note: When the count value is a positive number, the count value is the ending position within the string. It doesn’t select the element itself.

Here is an example using the phrase “Hello, I must be going, see you later, bye.” which contains four values separated by commas. I want to capture the third value of “see you later”.

1 2 3 4
Hello I must be going See you later bye

To select “see you later”, I choose the third column as the place where I will stop grabbing data – but, since this is the stopping point, I will get the rest of the phrase up until the third column.

mysql> select SUBSTRING_INDEX("Hello, I must be going, see you later, bye.",",",3) as PARSE;
+---------------------------------------+
| PARSE                                 |
+---------------------------------------+
| Hello, I must be going, see you later |
+---------------------------------------+
1 row in set (0.00 sec)

I don’t want all of the phrase, I only want “see you later”. With a positive count value of three, I am getting the ending point – it stops at the third delimited value. If I use a negative count value, then I am getting the starting point of the string. If I count backwards, which would give me a negative count value (and I want to do this anyway because the first line of data in the main example may have multiple commas), I can use the value of negative two (-2) and get this:

mysql> select SUBSTRING_INDEX("Hello, I must be going, see you later, bye.",",",-2) as PARSE;
+----------------------+
| PARSE                |
+----------------------+
|  see you later, bye. |
+----------------------+
1 row in set (0.00 sec)

That gives me a starting point for grabbing “see you later”, but I don’t want any data (the single word “bye”) after this. So, I can wrap an additional SUBSTRING_INDEX around the first one, and then only choose the first delimited set of data from the inner SUBSTRING_INDEX.

It looks like this:

The statement select SUBSTRING_INDEX(“Hello, I must be going, see you later, bye.”,”,”,-2) grabs both of these columns in blue:

1 2 3 4
Hello I must be going See you later bye

But then I can use the second SUBSTRING_INDEX command to only select the first column of the remaining two columns:

SUBSTRING_INDEX(SUBSTRING_INDEX(“Hello, I must be going, see you later, bye.”,”,”,-2), “,”, 1)

1 2
See you later bye

I can test it:

mysql> select SUBSTRING_INDEX(SUBSTRING_INDEX("Hello, I must be going, see you later, bye.",",",-2), ",", 1) as PARSE;
+----------------+
| PARSE          |
+----------------+
|  see you later |
+----------------+
1 row in set (0.01 sec)

This is essentially how I am going to parse the long line of data (after I convert the line feeds to commas). I will go backwards to the starting point, and then only select the first column beginning at that starting point.

I can then use the table I created to map put the data values I want to extract.

Column Name Backwards Value
ITEM_NAME -32
MANIFEST_NUMBER -25
MANUFACTURER_CODE -21
INFO_ONE -17
INFO_TWO -13
INFO_THREE -9
PROD_START_DATE -5
PROD_START_TIME -1

Example: To extract the PROD_START_TIME, which is at position -1, I will use this in the trigger:

SET NEW.PROD_START_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1);

I can now create my trigger to parse all of the data being inserted into the MANIFEST_DATA column. In the trigger, I will use some temporary variables to parse the MANIFEST_DATA, and I will also use the SUBSTRING function to parse the PROD_START_DATE and PROD_START_TIME fields to rearrange them into the correct MySQL DATETIME format. I will use the CONCAT function to combine them into a new DATETIME field. I have some notes in the stored procedure to help you figure out what I am doing.

DELIMITER $$
  
CREATE TRIGGER _convert_MANIFEST_DATA
BEFORE INSERT ON MANIFEST_DATA_DB.MANIFEST_DATA_TABLE
FOR EACH ROW
BEGIN

# Declare temporary variables

# This is the temporary field which contains all of the data
DECLARE _parse_MANIFEST_DATA varchar(4096);

# The temporary date and time fields needed to create the DATETIME field named PROD_TIMESTAMP
DECLARE _parse_TIME_DATE_DATE char(10);
DECLARE _parse_TIME_DATE_TIME char(10);

DECLARE _parse_TIME_DATE_date_final char(10);

DECLARE _parse_TIME_DATE_hour char(2);
DECLARE _parse_TIME_DATE_minutes char(2);
DECLARE _parse_TIME_DATE_sec char(2);

# Take the incoming large dataset which has line feeds and
# Replace the line feeds with a comma
set _parse_MANIFEST_DATA = replace(NEW.MANIFEST_DATA, "\n", ",");

# Parse each new column from the temporary field
SET NEW.ITEM_NAME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -35), ",", 1);
SET NEW.MANIFEST_NUMBER = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -25), ",", 1);
SET NEW.MANUFACTURER_CODE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -21), ",", 1);
SET NEW.INFO_ONE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -17), ",", 1);
SET NEW.INFO_TWO = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -13), ",", 1);
SET NEW.INFO_THREE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -9), ",", 1);
SET NEW.PROD_START_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1);
SET NEW.PROD_START_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1);

# Get the values from these two fields in order to combine them into a DATETIME field
SET _parse_TIME_DATE_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1);
SET _parse_TIME_DATE_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1);

# Convert the date from MM/DD/YYYY to YYYY-MM-DD
SET _parse_TIME_DATE_date_final = STR_TO_DATE(_parse_TIME_DATE_DATE, '%c/%d/%Y');

# Parse the time so we can add colons between the hour, minutes and seconds
SET _parse_TIME_DATE_hour = SUBSTRING(_parse_TIME_DATE_TIME, 1, 2);
SET _parse_TIME_DATE_minutes = SUBSTRING(_parse_TIME_DATE_TIME, 3, 2);
SET _parse_TIME_DATE_sec = SUBSTRING(_parse_TIME_DATE_TIME, 5, 2);

# Combine the newly-formatted date with the newly-formatted time so it conforms to the DATETIME format
SET NEW.PROD_TIMESTAMP = CONCAT(_parse_TIME_DATE_date_final, " ", _parse_TIME_DATE_hour, ":", _parse_TIME_DATE_minutes, ":", _parse_TIME_DATE_sec) ;

END$$

DELIMITER ;

Now I can insert the sample data into the database – where all of the data will go into the MANIFEST_DATA column, and the trigger will populate the rest of the columns by parsing the data.

mysql> INSERT INTO MANIFEST_DATA_TABLE (MANIFEST_DATA) VALUES ('Header information which may, or may not, contain one or more commas
    '> Item Name,Type,Description,Value
    '> Potato Chips - Sealed Bag,,,
    '> Manifest Number,Number,MANIFEST_NUMBER,8480014
    '> Manufacturer Code,Number,BRAND_CODE,80375993
    '> Information 1,Number,INFO_ONE,15869563
    '> Information 2,Number,INFO_TWO,6569569665
    '> Information 3,Number,INFO_THREE,562
    '> Planned Start Date,String,PROD_START_DATE,9/13/2018
    '> Planned Start Time,String,PROD_START_TIME,081234');
Query OK, 1 row affected (0.07 sec)

When I look at the contents of the table, I can see the trigger executed successfully. All of the “non-conforming” data is stored in the MANIFEST_DATA field, but the other fields were populated:

mysql> select * from MANIFEST_DATA_TABLE\G
*************************** 1. row ***************************
 id_MANIFEST_DATA: 1000000
    MANIFEST_DATA: Header information which may, or may not, contain one or more commas
Item Name,Type,Description,Value
Potato Chips - Sealed Bag,,,
Manifest Number,Number,MANIFEST_NUMBER,8480014
Manufacturer Code,Number,BRAND_CODE,80375993
Information 1,Number,INFO_ONE,15869563
Information 2,Number,INFO_TWO,6569569665
Information 3,Number,INFO_THREE,562
Planned Start Date,String,PROD_START_DATE,9/13/2018
Planned Start Time,String,PROD_START_TIME,081234
        ITEM_NAME: Type
  MANIFEST_NUMBER: 8480014
MANUFACTURER_CODE: 80375993
         INFO_ONE: 15869563
         INFO_TWO: 6569569665
       INFO_THREE: 562
  PROD_START_DATE: 9/13/2018
  PROD_START_TIME: 081234
   PROD_TIMESTAMP: 2018-09-13 08:12:34
1 row in set (0.00 sec)

Of course, this is only a short example of all of the parsing you can do with triggers in MySQL. And, if you want to look at your trigger, you can query the INFORMATION_SCHEMA.TRIGGERS table:

mysql> SELECT * FROM INFORMATION_SCHEMA.TRIGGERS WHERE TRIGGER_NAME='_convert_MANIFEST_DATA'\G
*************************** 1. row ***************************
           TRIGGER_CATALOG: def
            TRIGGER_SCHEMA: MANIFEST_DATA_DB
              TRIGGER_NAME: _convert_MANIFEST_DATA
        EVENT_MANIPULATION: INSERT
      EVENT_OBJECT_CATALOG: def
       EVENT_OBJECT_SCHEMA: MANIFEST_DATA_DB
        EVENT_OBJECT_TABLE: MANIFEST_DATA_TABLE
              ACTION_ORDER: 1
          ACTION_CONDITION: NULL
          ACTION_STATEMENT: BEGIN

# Declare temporary variables

# This is the temporary field which contains all of the data
DECLARE _parse_MANIFEST_DATA varchar(4096);

# The temporary date and time fields needed to create the DATETIME field named PROD_TIMESTAMP
DECLARE _parse_TIME_DATE_DATE char(10);
DECLARE _parse_TIME_DATE_TIME char(10);

DECLARE _parse_TIME_DATE_date_final char(10);

DECLARE _parse_TIME_DATE_hour char(2);
DECLARE _parse_TIME_DATE_minutes char(2);
DECLARE _parse_TIME_DATE_sec char(2);

# Take the incoming large dataset which has line feeds and
# Replace the line feeds with a comma
set _parse_MANIFEST_DATA = replace(NEW.MANIFEST_DATA, "
", ",");

# Parse each new column from the temporary field
SET NEW.ITEM_NAME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -35), ",", 1);
SET NEW.MANIFEST_NUMBER = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -25), ",", 1);
SET NEW.MANUFACTURER_CODE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -21), ",", 1);
SET NEW.INFO_ONE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -17), ",", 1);
SET NEW.INFO_TWO = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -13), ",", 1);
SET NEW.INFO_THREE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -9), ",", 1);
SET NEW.PROD_START_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1);
SET NEW.PROD_START_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1);

# Get the values from these two fields in order to combine them into a DATETIME field
SET _parse_TIME_DATE_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1);
SET _parse_TIME_DATE_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1);

# Convert the date from MM/DD/YYYY to YYYY-MM-DD
SET _parse_TIME_DATE_date_final = STR_TO_DATE(_parse_TIME_DATE_DATE, '%c/%d/%Y');

# Parse the time so we can add colons between the hour, minutes and seconds
SET _parse_TIME_DATE_hour = SUBSTRING(_parse_TIME_DATE_TIME, 1, 2);
SET _parse_TIME_DATE_minutes = SUBSTRING(_parse_TIME_DATE_TIME, 3, 2);
SET _parse_TIME_DATE_sec = SUBSTRING(_parse_TIME_DATE_TIME, 5, 2);

# Combine the newly-formatted date with the newly-formatted time so it conforms to the DATETIME format
SET NEW.PROD_TIMESTAMP = CONCAT(_parse_TIME_DATE_date_final, " ", _parse_TIME_DATE_hour, ":", _parse_TIME_DATE_minutes, ":", _parse_TIME_DATE_sec) ;

END
        ACTION_ORIENTATION: ROW
             ACTION_TIMING: BEFORE
ACTION_REFERENCE_OLD_TABLE: NULL
ACTION_REFERENCE_NEW_TABLE: NULL
  ACTION_REFERENCE_OLD_ROW: OLD
  ACTION_REFERENCE_NEW_ROW: NEW
                   CREATED: 2018-09-20 22:13:28.54
                  SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
                   DEFINER: root@localhost
      CHARACTER_SET_CLIENT: utf8mb4
      COLLATION_CONNECTION: utf8mb4_0900_ai_ci
        DATABASE_COLLATION: utf8mb4_0900_ai_ci
1 row in set (0.02 sec)

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.
Tony is the author of Twenty Forty-Four: The League of Patriots 
Visit http://2044thebook.com for more information.
Tony is the editor/illustrator for NASA Graphics Standards Manual Remastered Edition 
Visit https://amzn.to/2oPFLI0 for more information.
Advertisement

Manipulating queries with non-conforming data via MySQL Query Rewrite Plugin, triggers and stored procedures

The MySQL database is used in thousands of third-party applications, but what can you do when you want to use MySQL with an application, but that application’s queries or data doesn’t match MySQL’s data type or SQL format?

This post will show you three ways to alter a query or mismatched data when you don’t have control of the application’s source code. Of course, there are hundreds of different ways to do what I am about to show you. In this example, I will show you how to use the MySQL Query Rewrite Plugin along with a trigger to alter the non-conforming data. I will also show you an example of manipulating data with a stored procedure.

A customer emailed me with a problem. They wanted to use MySQL for a third-party application, but they didn’t have access to the source code. Their main problem was the application’s TIMESTAMP format didn’t conform to MySQL’s TIMESTAMP format. To be specific, this application produced a TIMESTAMP value that included a trailing time zone, such as “2018-09-05 17:00:00 EDT”. MySQL has two column data types where you can store both the date and time in one column: TIMESTAMP and DATETIME – but MySQL cannot handle TIMESTAMP or DATETIME data with a trailing time zone.

When a TIMESTAMP value is being inserted into a row, MySQL converts the TIMESTAMP value from the current time zone set by the MySQL server (see Time Zone Support) to UTC (Coordinated Universal Time) for storage, and converts the data back from UTC to the current time zone (of the server) when retrieved. (This conversion does not occur for other types such as DATETIME.) By default, the current time zone for each connection is the server’s local time. The time zone can be set on a per-connection basis, and as long as the time zone setting remains constant, you will get back the same value you stored. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions. The current time zone is available as the value of the time_zone system variable. For more information, see Section 5.1.12, “MySQL Server Time Zone Support”.

(From: https://dev.mysql.com/doc/refman/8.0/en/datetime.html)

The customer told me that this application would only be sending data with two different trailing time zones – Central and Eastern. With daylight-savings in use in both of these time zones, this would give us four possible trailing time zone values – CDT, CST, EDT and EST. What we want to do is to intercept the query, and write this TIMESTAMP data to a different column, and then convert the value to UTC time to be stored in the correct column in the database. Because we don’t have access to the source code, I am assuming we have full access to the MySQL database.


NOTE: Since we are using time zone information, if you want to duplicate this post, be sure to load the MySQL time zone information. See: https://dev.mysql.com/doc/refman/8.0/en/time-zone-support.html

 

The MySQL Rewrite Plugin

In MySQL version 5.7, a plugin named the “Query Rewrite Plugin” was introduced. This plugin can examine SQL statements received by the server and modify those statements before the server executes them. In other words, this gives you the ability to intercept “bad” queries and re-format them to be “good” queries for use with MySQL – or to rewrite the queries to do whatever you need. Think of it as a way to change the source code without actually having the source code.

Installing the plugin is fairly easy. In MySQL version 8.0, you install (or uninstall) the plugin via an SQL script provided with your MySQL installation. The script is named install_rewriter.sql and is located in the “share” directory under your MySQL home directory.

# cd /usr/local/mysql/share  (your directory may be different)
# mysql -u root -p < install_rewriter.sql
Enter password: (enter root password here)

The script only takes a few seconds to load (The uninstall script is named uninstall_rewriter.sql). To check and make sure the plugin was installed, run this command from within MySQL:

mysql> SHOW GLOBAL VARIABLES LIKE 'rewriter_enabled';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| rewriter_enabled | ON    |
+------------------+-------+
1 row in set (0.00 sec)

The plugin was installed correctly if the column named “Value” is set to “ON“.

For this example, I am going to create a small table with three columns, and assume that this is an table from a third-party application. The date_time_value column is where the application would normally store the timestamp information.

mysql> create database test;
 Query OK, 1 row affected (0.01 sec)
mysql> use test;
 Database changed
mysql> CREATE TABLE `time_example` (
  `idtime` int(11) NOT NULL AUTO_INCREMENT,
  `action_record` varchar(30) NOT NULL,
  `date_time_value` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`idtime`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=UTF8MB4;
Query OK, 0 rows affected (0.03 sec)

The date_time_value column will obviously not be able to store timestamp data with a trailing time zone, but let’s see what happens when we try and insert a row of data – and let’s pretend that this is the query the application uses.

mysql> insert into test.time_example (action_record, date_time_value) 
 values ('Arrived at work', '2018-09-05 17:00:00 EDT');
Error Code: 1292. Incorrect datetime value: '2018-09-05 17:00:00 EDT' 
 for column 'date_time_value' at row 1

Of course, we get an error because the format for the timestamp is incorrect.

What we want to do is to alter the table and add a column to store this improperly-formatted timestamp data.

mysql> ALTER TABLE `test`.`time_example` 
    -> ADD COLUMN `date_time_storage` VARCHAR(23) NULL AFTER `date_time_value`;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

Now that we have a column (date_time_storage) to store the “bad” timestamp data, we need to modify the incoming query so that it writes the timestamp data into the new column.


Note: In MySQL 8.0+, with the Query Rewrite Plugin, you can modify SELECT, INSERT, REPLACE, UPDATE, and DELETE statements. (Prior to MySQL 8.0.12 you could only modify SELECT statements)

 

This is the query the application is sending to the database:

insert into test.time_example (action_record, date_time_value) values (?, ?);

We want to modify the query to use the new date_time_storage column, instead of the date_time_value column. The new query would look like this:

insert into test.time_example (action_record, date_time_storage) values (?, ?);

Now that we have our old (bad) and new (good) queries, we can insert this into the rewrite_rules table of the query_rewrite database.

INSERT INTO query_rewrite.rewrite_rules
    (pattern, replacement, pattern_database) VALUES(
    'insert into test.time_example (action_record, date_time_value) values (?, ?)',
    'insert into test.time_example (action_record, date_time_storage) values (?, ?)',
    'time_example'
    );
1 row(s) affected, 1 warning(s): 1105 Query 'insert into test.time_example 
 (action_record, date_time_value) values ('Left building', '2018-09-05 17:00:00 EDT')' 
 rewritten to 'insert into test.time_example (action_record, date_time_storage) 
 values ('Left building', '2018-09-05 17:00:00 EDT')' by a query rewrite plugin

(More examples may be found on this page: Query Rewrite Plugin Usage)

We need to execute a stored procedure named flush_rewrite_rules to make this query-rewrite change permanent: (See: https://dev.mysql.com/doc/refman/8.0/en/rewriter-query-rewrite-plugin-usage.html)

mysql> CALL query_rewrite.flush_rewrite_rules();
Query OK, 1 row affected (0.00 sec)

We can confirm the INSERT INTO query_rewrite.rewrite_rules by looking at the rewrite_rules table:

mysql> SELECT * FROM query_rewrite.rewrite_rules\G
*************************** 1. row ***************************
                id: 1
           pattern: insert into test.time_example (action_record, date_time_value) values (?, ?)
  pattern_database: time_example
       replacement: insert into test.time_example (action_record, date_time_storage) values (?, ?)
           enabled: YES
           message: NULL
    pattern_digest: e823e987338aeae6d47f7a729e78f532d3ff3721237c15981bcd11fc2607efda
normalized_pattern: insert into `test`.`time_example` (`action_record`,`date_time_value`) values (?,?)
1 row in set (0.00 sec)

Next, let’s run the same query as before, and see if it puts the timestamp data that is supposed to go into the date_time_value column into the new date_time_storage column:

mysql> insert into test.time_example (action_record, date_time_value) 
 values ('Arrived at work', '2018-09-05 17:00:00 EDT');
Query OK, 1 row affected, 1 warning (0.01 sec)

And now the table contains this data:

mysql> select * from time_example;
+--------+-----------------+-----------------+-------------------------+
| idtime | action_record   | date_time_value | date_time_storage       |
+--------+-----------------+-----------------+-------------------------+
|      1 | Arrived at work | NULL            | 2018-09-05 17:00:00 EDT |
+--------+-----------------+-----------------+-------------------------+
1 rows in set (0.00 sec)

We now have the timestamp with the time zone data stored in the MySQL database, but we need to convert this to a proper format, and put the result into the date_time_value column.

To do this, we can use a trigger.

Normally, you would want your application to produce data in the correct format, but in this example, we don’t have access to the source code. So, we can create a trigger to convert the “incorrectly-formatted” data in date_time_storage to the correct data and store it in date_time_value.


NOTE: These examples won’t work if your TIMESTAMP uses microseconds (6-digits) precision (example: ‘1970-01-01 00:00:01.000000’) – but you can modify the code to accommodate microseconds.

 

Here is the SQL to create the trigger:

DELIMITER $$
  
CREATE TRIGGER _time_zone_convert_insert2
AFTER INSERT ON time_example
FOR EACH ROW
BEGIN

DECLARE _date_time_no_tz varchar(20);

SET _date_time_no_tz = SUBSTRING(NEW.date_time_storage, 1, 20);

IF NEW.date_time_storage like '%EDT' THEN
    SET NEW.date_time_value = CONVERT_TZ(_date_time_no_tz,'EST5EDT','GMT');
END IF;

IF NEW.date_time_storage like '%EST' THEN
    SET NEW.date_time_value = CONVERT_TZ(_date_time_no_tz,'EST5EDT','GMT');
END IF;

IF NEW.date_time_storage like '%CDT' THEN
    SET NEW.date_time_value = CONVERT_TZ(_date_time_no_tz,'EST5EDT','GMT');
END IF;

IF NEW.date_time_storage like '%CST' THEN
    SET NEW.date_time_value = CONVERT_TZ(_date_time_no_tz,'EST5EDT','GMT');
END IF;

END$$

DELIMITER ;

Now that we have a trigger in place, let’s insert another line into the database – BUT, we still want to use the SQL from the application. The query will try and write to the date_time_value column, but the Query Rewrite Plugin will intercept the original query and substitute our new query instead – which will insert the timestamp data into the date_time_storage column, and then the trigger will convert the timestamp and place the correct value into the date_time_value column.

mysql> INSERT INTO time_example (action_record, date_time_value) 
 VALUES ('Lunch Break', '2018-09-05 18:00:00 EDT');
Query OK, 1 row affected (0.00 sec)

The table now contains a true timestamp column with the correct timestamp value in UTC. (The old row didn’t change)

mysql> SELECT * FROM test.time_example;
+--------+------------+---------------------+-------------------------+
| idtime | product_id | date_time_value     | date_time_storage       |
+--------+------------+---------------------+-------------------------+
|      1 | time now1  | NULL                | 2018-09-05 18:00:00 EDT |
|      2 | time now2  | 2018-09-05 22:00:00 | 2018-09-05 18:00:00 EDT |
+--------+------------+---------------------+-------------------------+
2 rows in set (0.00 sec)

But what about stored procedures?

The easiest way to handle the time zone conversion is with a trigger. But, to show you how stored procedures can do the same thing, I have an example of a stored procedure. In this example, I will be passing the values of the idtime and date_time_storage columns.

This example will be similar to the one above – I created a table named time_example, but this time, I am including the extra column:

'CREATE TABLE `time_example` (
  `idtime` int(11) NOT NULL AUTO_INCREMENT,
  `action_record` varchar(30) NOT NULL,
  `date_time_value` timestamp NULL DEFAULT NULL,
  `date_time_storage` varchar(23) DEFAULT NULL,
  PRIMARY KEY (`idtime`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8'

I then inserted a row, where I am storing the time stamp with the time zone information:

mysql> insert into test.time_example (action_record, date_time_storage) 
 values ('Left work', '2018-09-05 17:00:00 EDT’);
Query OK, 1 row affected (0.00 sec)

Here is what the row looks like:

mysql> SELECT * FROM test.time_example;
+--------+------------+-----------------+-------------------------+
| idtime | product_id | date_time_value | date_time_storage       |
+--------+------------+-----------------+-------------------------+
|      1 | Left work  | NULL            | 2018-09-05 17:00:00 EDT |
+--------+------------+-----------------+-------------------------+
1 row in set (0.00 sec)

Again, the date_time_storage column is a temporary storage column. I will call the stored procedure, and provide the idtime and date_time_storage values. The stored procedure which will look at the last three characters in the date_time_storage column, and then convert the time to UTC, which is then stored in the date_time_value column.

call _check_time_zone('1','2018-09-05 17:00:00 EDT');

Now the row looks like this, where the date_time_value column is now stored as UTC:

mysql> SELECT * FROM test.time_example;
+--------+------------+---------------------+-------------------------+
| idtime | product_id | date_time_value     | date_time_storage       |
+--------+------------+---------------------+-------------------------+
|      1 | Left work  | 2018-09-05 21:00:00 | 2018-09-05 17:00:00 EDT |
+--------+------------+---------------------+-------------------------+
1 row in set (0.00 sec)

And here is the code to create the stored procedure:

DELIMITER $$
CREATE DEFINER=`root`@`localhost` 
PROCEDURE `_check_time_zone`(IN _id_time INT, IN _date_time_storage VARCHAR(23))
BEGIN

DECLARE _date_time_no_tz varchar(20);

SET _date_time_no_tz = SUBSTRING(_date_time_storage, 1, 20);

IF _date_time_storage like '%EDT' THEN 
UPDATE time_example SET date_time_value = CONVERT_TZ(_date_time_no_tz,'EST5EDT','GMT')
WHERE idtime = _id_time;
END IF;

IF _date_time_storage like '%EST' THEN 
UPDATE time_example SET date_time_value = CONVERT_TZ(_date_time_no_tz,'EST5EDT','GMT')
WHERE idtime = _id_time;
END IF;

IF _date_time_storage like '%CDT' THEN 
UPDATE time_example SET date_time_value = CONVERT_TZ(_date_time_no_tz,'CST5CDT','GMT')
WHERE idtime = _id_time;
END IF;

IF _date_time_storage like '%CST' THEN 
UPDATE time_example SET date_time_value = CONVERT_TZ(_date_time_no_tz,'CST5CDT','GMT')
WHERE idtime = _id_time;
END IF;

IF _date_time_storage like '%UTC' THEN 
UPDATE time_example SET date_time_value = _date_time_no_tz
WHERE idtime = _id_time;
END IF;

END $$
DELIMITER ;

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.
Tony is the author of Twenty Forty-Four: The League of Patriots 
Visit http://2044thebook.com for more information.
Tony is the editor/illustrator for NASA Graphics Standards Manual Remastered Edition 
Visit https://amzn.to/2oPFLI0 for more information.