Convert .csv File to MySQL Database via Perl

Your post reminds me of a similar, but smaller script that I wrote some time ago. Both the script and some of the comments left may be of interest to you:

http://blog.josephhall.com/2008/11/importing-usda-sr21-into-mysql.html

And are you aware that MySQL can natively import CSV data with the LOAD DATA INFILE command? It also exports with SELECT INTO OUTFILE.

Yes, I am aware that you can import a .csv file into a MySQL database. But this script allows you to create a new table and produces the names of the columns for you, and it figures out the necessary lengths of each column.

Do you have any scripts to go backwards, from sql dumps (of various flavours) to CSV?

I don’t have a way to take a MySQL dump file and convert it to a .csv file, but that is a good suggestion for a future post.

But you can export data as a .csv file from MySQL.

SELECT name, address, etc. INTO OUTFILE ‘/tmp/filename.csv’
FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘”‘
ESCAPED BY ‘\\’
LINES TERMINATED BY ‘\n’
FROM users WHERE …..

Some comments on your Perl.

1/ You should always “use strict” and “use warnings”. Commenting out “use warnings” is not an acceptable fix for your problems.

2/ No need to quote “$filename” in open FILE, “$filename”.

3/ Use lexical filehandles and three-argument open – open my $file, ‘<', $filename

4/ Use 'chomp' instead of 'chop'

5/ Your split looks horrible. How about just split(/","/). The $_ is implied.

And most importantly, a lot of your program can be massively simplified by using the module Text::CSV. If, for some reason, you don't want to install new modules, Text::ParseWords vomes with Perl and has much of the same functionality.

Hope this helps.

Here’s a version written in PHP which lets MySQL do all the heavy lifting of figuring out what field type to use.
<?php $file = 'test.csv'; $table = 't'; $fileLineTerminator = '\r\n';


mysql_connect('localhost', 'root', 'password');

mysql_select_db('test');
// Read the column headers

$f = fopen($file, 'r');

$line = trim(fgets($f));

fclose($f);
// Replace spaces with underscores

$line = str_replace(' ', '_', $line);
// Create the table using the column headers

// All fields are created as blobs. Will alter table to optimize type.

$fields = explode('","', trim($line, '"'));

$sql = "CREATE TABLE $table (`";

$sql .= implode('` blob, `', $fields);

$sql .= '` blob)';

mysql_query($sql) or die(mysql_error());

// Load the data $sql = << .....

[Editor]
I either exceed comment length, or the comment system doesn’t like heredoc syntax. Here’s the pastebin:
http://pastebin.com/dthgD3X1

@Scott Noyes … that’s fantastic! You rock!

Tony, thanks for starting this thread!

@Dave Cross … Seriously? Your critique of Tony’s Perl is just a bit obnoxious. Yeah, you know it all man…

Obnoxious? Really? So how should I have phrased it?

Perl has evolved. There are better ways to write Perl programs. Tony obviously has a really basic understanding of Perl. He gets his job done, but a couple of days reading something like “Modern Perl” (http://www.onyxneon.com/books/modern_perl/) would make his programs smaller, easier to write and easier to maintain.

@Dave Cross/ Uh, yes, definitely obnoxious, as in, “Your split looks horrible.” Your followup comment to mine makes my point even more valid, “but a couple of days reading something like ‘Modern Perl'” …? Seriously??? again you bore us with disenchanting discourse. This thread is not about coding as much as it is about innovation and getting things done (RTFM, or sidebar in this case). I ran the Perl script and it gets the job done.

What should you have said? Well, nothing, I guess. You should have taken the participation/contributor route like @Scott Noyes and pasted your self-proclaimed, beautifully crafted code, instead of sitting on the self-assigned seat of Perl Parliament! Better yet, take the time to contribute your improvements on github.com.

I seriously didn’t mean to upset people. I was just trying to post some pointers that would allow Tony to improve his code. I really don’t understand why a suggestion to read a (really good) book can be taken as obnoxious. It’s not even like you have to pay for the book – there’s a version available for free download.

I never suggested that Tony’s code didn’t work in exactly the way that he said it did. I only meant that better written code is easier to maintain.

But your point about criticising and not offering alternatives is well made. Here’s a link to my alternative version. Feel free to criticise it as much as you like.

	#!/usr/bin/perl

	use strict;
	use warnings;

	use Data::Dumper;
	use Text::ParseWords;
	use List::Util 'max';

	my $file = shift \|\| die "Need a csv file to process\n";

	my $table_name = 'Addresses';
	my $engine = 'InnoDB';
	my $charset = 'latin1';

	# Read and process header line
	open my $data, '<', $file or die "Cannot open $file for reading\n";

	my $header = <$data>;
	chomp $header;

	# @cols will be an array of hashes. Each has will contain details
	# of one of the columns
	my @cols = map { { name => $_ } } parse_line ',', 0, $header;
	for (@cols) {
	$_->{name} =~ s/\s+/_/g;
	$_->{name} =~ s/'/\\'/g;
	}

	# Read the rest of the data
	while (<$data>) {
	chomp;
	my @row = parse_line ',', 0, $_ ;
	for my $v (@row) {
	s/'/\\'/g;
	}

	foreach my $col_no (0 .. $#cols) {
	push @{$cols[$col_no]{values}}, $row[$col_no];
	}
	}

	# Analyse the data
	foreach (@cols) {
	guess_type($_);
	}

	# Output the table
	open my $table, '>', 'mysql_create_table2.sql' or
	die "Can't open file for table: $!";

	print $table table_def(@cols);

	# Output the data
	open my $values, '>', 'mysql_data_values2.sql' or
	die "Can't open file for values: $!";

	for my $row (0 .. $#{$cols[0]{values}}) {
	print $values insert($row, @cols);
	}

	die Dumper \@cols;

	# Analyse an array of column hashes and fill in various information
	# about the columns by looking at the data values in each column.
	sub guess_type {
	my $column = shift;

	$column->{type} = 'varchar';

	foreach my $val (@{$column->{values}}) {
	if ($val !~ /^-?\d+(\.\d+)?$/) {
	$column->{type} = 'varchar';
	last;
	}

	if ($val =~ /^-?\d+\.\d+$/) {
	$column->{type} = 'decimal';
	} else {
	$column->{type} = 'int';
	}
	}

	if ($column->{type} eq 'decimal') {
	$column->{dec1} = max map { length +(split /\./)[0] } @{$column->{values}};
	$column->{dec2} = max map { length +(split /\./)[1] } @{$column->{values}};
	} else {
	$column->{length} = max map { length $_ } @{$column->{values}};
	}
	}

	# Return a table definition string given an array of column hashes.
	sub table_def {
	my @columns = @_;

	return "\n\nCREATE TABLE `$table_name` (\n" .
	join(",\n", map { column_def($_) } @columns) .
	"\n) ENGINE=$engine DEFAULT CHARSET=$charset\n" .
	"\n\n";
	}

	# Given a column hash, return a string containing the SQL column definition.
	sub column_def {
	my $column = shift;

	my $def = " `$column->{name}` $column->{type} ";

	if ($column->{type} eq 'decimal') {
	my $dec_length = $column->{dec1} + $column->{dec2};
	$def .= "($dec_length,$column->{dec2})";
	} else {
	$def .= "($column->{length})";
	}

	return $def;
	}

	# Given a row number and an array of column hashes, return a string
	# containing an SQL insert statement for the given row.
	sub insert {
	my ($row, @columns) = @_;

	return "insert into $table_name (" .
	join(', ', map { $_->{name} } @columns) .
	") \nvalues (" .
	join(', ', map { "'$_->{values}[$row]'" } @cols) .
	");\n";
	}

view raw

mysql-import

hosted with ❤ by GitHub

It seems to me that given Tony’s sample data, this code prints exactly the same output as the original program except for one character where I think I’ve fixed a bug. Tony’s version declared the Amount3 column as int(2). Mine declares it as int(6). I believe that mine is correct given the sample data. I think that this bug is down to the problems explained by morungos in comment 9.

> Tony’s version declared the Amount3 column as int(2). Mine declares it as int(6).

Unless you have zerofill defined, the difference between int(2) and int(6) is nothing (“doodly-squat”, as my father would say).

Well @Dave Cross, hats off to you and your choice to contribute. Your code looks very well put together.

I either exceed comment length, or the comment system doesn’t like heredoc syntax. Here’s the pastebin:
http://pastebin.com/dthgD3X1

Thank you so much! Scott – you made my day :)

The expression “($decimal_length1[$field_count] lt length($split_decimal_number[0]))” is almost certainly broken. The “lt” operator is a string comparison (equiv to strcmp functionality), not a numeric comparison, so, for example, 10 lt 2 is actually true, even though 10 and 2 don’t look like strings. You probably want the “<" operator.

And "if ($length[$field_count] lt 'length($Field_Values[$field_count])')" is probably even more broken, as the second operand is a quoted string.

A great post. Nice to read.

Follow Tony on Twitter
Follow @scriptingmysql
Find me on LinkedIn:
info [at] ScriptingMySQL.com

Scripting MySQL

Convert .csv File to MySQL Database via Perl

18 Responses to Convert .csv File to MySQL Database via Perl

Leave a reply to Tony Darnell Cancel reply

About the author:

Interesting Links

Recent Posts

Archives

Categories

Blog Stats

ScriptingMySQL RSS Feds

	Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc.
Oracle Cloud Infrastructure 2018 Architect Associate MySQL 5.7 DB Administrator MySQL 5.6 DB Administrator MySQL 5.6 Developer MySQL Cloud Service Implementation Specialist
	Top 15 MySQL Blogger of 2018

Scripting MySQL

Convert .csv File to MySQL Database via Perl

Share this:

Related

18 Responses to Convert .csv File to MySQL Database via Perl

Leave a reply to Tony Darnell Cancel reply

About the author:

Interesting Links

Recent Posts

Archives

Categories

Blog Stats

ScriptingMySQL RSS Feds