Tuesday, June 23, 2009

DBF to MySQL Data Conversion on Windows Hands-Free

I spent several weeks looking for an easy way to import data from DBF files (the application was written in FoxPro for DOS some time around 1993) into MySQL on a regular basis. First, I looked for a utility to perform one-step conversion (DBF to MySQL). No luck there (not that I didn't find any, just none of them worked).

Then, I figured, I could convert DBF files to CSV files first and then import them into MySQL using LOAD DATA INFILE, but the question was how to do the DBF-to-CSV part. OpenOffice Calc and MS Excel were out of the question (converting 30 tables every night manually wasn't something I was eager to do; besides, many of them exceed the limit of 65536 rows), so I had to look for other options, preferably something that can run unattended at night while I am at home sleeping.

Considering the fact that DBF format has been around for about three decades or so, one would expect there should be a ton of data migration utilities that work. Not really. Some didn't work at all, some failed to identify rows marked for deletion (be careful because even some commercial utilities do that).

The only one capable of doing what I needed was a free utility called dbf2csv by Dave Burton. You can download it from Burton Systems Software.

So, here is my kludge built around Dave Burton's utility.



Part 1 (DBF => CSV)


1. Unzip dbf2csv anywhere on your file system. For the purpose of this example we will use "C:\dbf2csv". No installation is required.


2. Copy your DBF files into a folder where you want to perform the conversion. I use "C:\dbf2csv\cnvrt". You can copy all DBF files from a folder using something like this:

copy "Z:\source folder\*.dbf" "C:\dbf2csv\cnvrt"

Z: can be a local drive or a mapped network drive. If you would rather not use persistent mapping, you can use NET USE to create a share just to copy the files and then delete the share. It might be a good idea to use it with START /wait.

Note that, if you are dealing with a FoxPro database and want to import memo fields, you also need to copy the *.fpt files.


3. Now let's convert all the copied DBF files (and FPT files, if any) to CSV format:

cd C:\dbf2csv

dbf2csv.bat "C:\dbf2csv\cnvrt\*.dbf"


The dbf2csv.bat batch file comes with the utility.

If all you need is to convert data from DBF files (and FPT files, if any) to CSV format, you can stop here. Just create a batch file, say dbf2csvcnvrt.bat, that looks something like this:

Y|COPY "Z:\source folder\*.dbf *.fpt" "C:\dbf2csv\cnvrt"

cd C:\dbf2csv

Y|dbf2csv.bat C:\dbf2csv\cnvrt\*.dbf


This will copy all the DBF and FPT files from the source folder on the Z: drive to the "conversion folder" on the C: drive overwriting the old ones in the "conversion folder" and then run the dbf2csv utility that will convert them to CSV format overwriting the old CSV files.

Now create a scheduled task to run dbf2csvcnvrt.bat at whatever time intervals you want it to run.

If you actually need to import the data into a MySQL database, read on.



Part 2 (CSV => MySQL)


4. Copy the CSV files to the MySQL data folder of the database you want to load the data into:

copy "C:\dbf2csv\cnvrt\*.csv" "C:\Program files\MySQL\MySQL Server 5.0\data\mydatabasefolder"

Why? Because MySQL will only let you load data from a CSV file that is in the data folder (for security reasons).


5. In MySQL, using whatever tool you are comfortable with (e.g., MySQL Administrator), create tables structurally identical to the source DBF files. To see column names and their attributes, open DBF files using OpenOffice Calc, which - unlike Excel - shows them all in column headers. Examples: COLUMN_NAME,C,10 means that it is a char column of length 10; COLUMN_NAME,N,9,2 means that it is a numeric column of length 9 and precision of 2, so in MySQL it will correspond to float datatype whereas, if no precision is specified, the corresponding MySQL datatype will probably be int; COLUMN_NAME,D is of type date, which is the same datatype in MySQL; COLUMN_NAME,M is a "memo" column, whose MySQL equivalent will probably be text datatype.


6. Every time you load new data, you want to truncate the target table/s and load the data from the CSV file/s (there are less crude ways, but this approach works for me just fine). So, create a mysql user (or edit permissions of an existing one) that has the following privileges for the database you want to load data into: DELETE (or else it won't let you truncate the table/s) and INSERT (obviously, you want to insert rows into the table/s). Also add a global privilege: FILE (otherwise, it is not going to let you load data from "external" files).


7. Create a text file (let's call it myimport.txt) using the example below:

TRUNCATE TABLE mydatabase.student;

TRUNCATE TABLE mydatabase.course;

LOAD DATA INFILE 'student.csv' INTO TABLE mydatabase.student FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;

LOAD DATA INFILE 'course.csv' INTO TABLE mydatabase.course FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;


In the above example, everything should be pretty much self-explanatory, except, maybe, the "IGNORE 1 LINES" part. It prevents the header row from being imported as data.

If your CSV files contain dates, and you are getting errors, before the first line that starts with LOAD DATA INFILE, you might have to insert this:
SET @@SESSION.sql_mode='ALLOW_INVALID_DATES';
In this case, myimport.txt will look something like this:

TRUNCATE TABLE mydatabase.student;

TRUNCATE TABLE mydatabase.course;

SET @@SESSION.sql_mode='ALLOW_INVALID_DATES';

LOAD DATA INFILE 'student.csv' INTO TABLE mydatabase.student FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;

LOAD DATA INFILE 'course.csv' INTO TABLE mydatabase.course FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;



8. Now run the following command:

"C:\Program Files\MySQL\MySQL Server 5.0\bin\mysql.exe" --user=myusername --password=mysecretword mydatabase < "C:/Program Files/MySQL/MySQL Server 5.0/bin/myimport.txt"

The path to mysql.exe shown is the default MySQL installation path on Windows (change it to whatever your actual path is). I dropped myimport.txt into the MySQL bin directory just in case. If your MySQL bin directory is in your system PATH, you don't have to specify the path to mysql.exe (it is not going to hurt though). When specifying the path to myimport.txt (or whatever you want to call it), be sure to use forward slashes instead of back slashes.

So, edit the example command line above to fit your specific needs and test it. If the data is imported correctly, it works.


9. Now, the only thing left is to put all of it together.

Create one batch file that will
  1. copy DBF and FPT files from the source folder to the "conversion folder" overwriting the old ones in the "conversion folder",
  2. run the dbf2csv utility, which will convert the newly copied files to CSV format (again, overwriting the old CSV files),
  3. copy the new CSV files to the data folder of the MySQL database you are going to import the data into (again, overwriting the old CSV files),
  4. import the data.
Here it is:

Y|COPY "Z:\source folder\*.dbf *.fpt" "C:\dbf2csv\cnvrt"

cd C:\dbf2csv

Y|dbf2csv.bat C:\dbf2csv\cnvrt\*.dbf

Y|COPY "C:\dbf2csv\cnvrt\*.csv" "C:\Program files\MySQL\MySQL Server 5.0\data\mydatabasefolder"

"C:\Program Files\MySQL\MySQL Server 5.0\bin\mysql.exe" --user=myusername --password=mysecretword mydatabase < "C:/Program Files/MySQL/MySQL Server 5.0/bin/myimport.txt"


Call this batch file whatever you want, e.g. dbf2mysql.bat, and create a scheduled task that will run it at whatever time intervals you want it to run (mine runs every night), and you are good to go... home.

14 comments:

Johnpaul said...

Clear explanation. Thank you very much. useful information. :)

Dave Burton said...

Thanks for the kind words about my dbf2csv utility. It hasn't changed much lately, but the latest version can always be found on my downloads page, here:

http://www.burtonsys.com/downloads.html


Dave Burton

ncdave4life said...

I've just updated dbf2csv to version 8. It has several improvements, the most important being support for FoxPro's 'B' ("Double") fields, and optional .json output format.

As always, you can find it on my downloads page, here:
http://www.burtonsys.com/download/DBF2CSV.ZIP

Dave

Gabi said...

Hi,
Thank you for the very useful information.
There is a little problem with the csv transformation, I loose the leading 0 on somme fields.
How can I correct it?

IP said...

Gabi,
Sorry for getting back to you so late (I was out of the country).
First, make sure that you actually do lose leading zeros. Don't use Excel to open the CSV output file. Use a plain text editor that is not going to make any stupid assumptions about the column formats the way Excel does.
If you do lose them, you might want to contact Dave Burton. Maybe, there is a bug in the version of dbf2csv you are using. I am still using the old one (I don't fix things that aren't broken), and it hasn't done anything like that to me.
Good luck!

ludolugo said...

hi, thanks this explanation!, but i have a problem: i get the message "NOte: to avoid the spurious "cant execute dbf2csv.pl..." what should i do? it sais to put the files in a directory on my path, but what path? what directory? thanks

IP said...

ludolugo,
I am not sure I understand what message you are getting, but here is the answer to your question about the "path":
You can put the dbf2csv anywhere you want, but the system needs to know where to look for it. Here is what you need to do:
- right-click on "My computer";
- click on "Properties";
- click on the "Advanced" tab;
- click on the "Environment Variables" button;
- under "System Variables" click on "Path" to highlight it;
- click the "Edit" button below;
- scroll all the way to the right;
- if there is no semicolon at the end, put a semicolon;
- after the semicolon, add the full path to where you put dbf2csv package, for example C:\utilities\dbf2csv\;
- reboot.
That's it.

IP said...

This comment is from Dave Burton (e-mailed to me):

-----( begin comment )-----
ludolugo, you need to read the dbf2csv.txt file which is included in the dbf2csv.zip archive. It is all explained there.

The short answer is that to run any Perl program, such as dbf2csv.pl, you need to have Perl installed. All Linux distros include Perl, but for Windows you need to either install a Perl 5 (e.g., Strawberry Perl or ActiveState), or you can just use the Perl 4 "perl4w32.exe" which is included in the dbf2csv.zip archive.

For example, to print out the short dbf2csv "help" message do:

perl4w32.exe dbf2csv.pl
-----( end comment )-----

Dave Burton said...

I've just uploaded v.9 of dbf2csv to my web site. This version adds one new feature: the ability to handle Foxpro's "T" (DateTime) fields.

Dave Burton

IP said...

Dave,
Thank you for making an excellent utility even better and sharing it with the rest of us!

gadget00 said...

this is an excellent post; much thanks to you for the mysql tips and to Dave Burton for such amazing and helpful utility!

IP said...

You are welcome, gadget00. Yes, Dave's dbf2csv is a life-saver :-)

ncdave4life said...

There's a new version 11 of dbf2csv available on my web site. It has improved handling of special characters, such as CR and LF, which are sometimes found in FoxPro "memo" fields.

Ira Portman said...

Thanks, Dave!