FoxPro .dbf to MySQL data issues for backtesting

I'm new to the SI data set and MySQL, but familiar with financial data, Excel, VBA, screening, FoxPro (dating way back to dBase II - wow! ), Market Guide data, etc. Was wondering if you have any tips on how to best import SI .dbf tables to MySQL, and how to set things up for good mechanical "grinding" on the data set after that? Here are some specifics I'm struggling with:

1. Having only a background in FoxPro, plus some SQL for DB2 I'm struggling with the numerous and various options for setting up MySQL.

Here is what I'm currently trying out:

machine (developer|server|dedicated Server): server (the performance of 8K records and 5 table joins is still a bit slow)
database (multifunctional|transactional|non-transactional) multifunctional (InnoDB)
InnoDB Tablespace Settings: c: \MySQL\ (had to adjust datadir="C:/MySQL/" in my.ini to make this work. On Visa for the moment - yes, I know, that is indeed unfortunate!)
Connections(20|500|manual): 20
Enable TCP/IP Networking: 3306
Firewall Exception: false
Enable strict mode: true
default character set: (standard|multilingualism|manual): standard (makes Latin1 the default character set)
Install as a service: true
Name: MySQL
Include Bin Directory in Windows PATH: true

Above seems to work O.K., but performance is not all that brisk. (I think OBDC jet in Excel VBA to .dbf tables can do 5 joins on 8k records quicker). However, I may be doing something else wrong that slows things down. Or, MySQL is running fine but the table/query viewer I'm using is slowing down displaying the results. On 8K records with 5 joins (company, sales Q, sales A, mgdesc 2 digit and mdesc 5 digit) on 8k records I'm getting a view in more than a minute.).

2. Thoughts on character encoding?
I'm thinking
default: latin1 -- cp1252 West European
collation sequence? defaults: latin1_sweedish_ci

other option might be to just to with UTF-8 / Unicode all the way around? (or too big?)

3. I managed to mine out the memo fields with ODBC / Jet. So that's O.K. except for some funny characters that seemed to be escaped. Have you dealt with the funny characters?

4. Should each table have a primary key? I speculate yes for performance?

5. The famous hidden column in FoxPro: "_NulFlags". This is an odd binary field that codes if each field in a Fox record contains a null. Do you do anything with this or just ignore it? Are nulls important? Null #0?, Null - not available? Any of this stuff important downstream in computing averages, standard deviations?
For now I just ignored that column but ....?

6. There are many tables with joins on company_ID. Should all these tables be kept separate for some speed reason or would they be better combined into larger tables enabling less joins on the subsequent queries?

7. si_dates are interesting and all over the place. Do you have ways to line up the dates so Q1 is Jan-Feb-Mar, etc.. Otherwise, how do the averages compute on misaligned date data?

8. Do you take the default column data types on the .dbf import? I see that MySQL data types are more numerous. Any glitches here or just take suggested column data type defaults?

9. When you are setting up for backtesting how do you retain the weekly SI data files and organize the MySQL tables into time series files? Can the last updated fields be used? One row per period regardless?

10. In setting up the return calculations do you incorporate dividends, splits, reorgs, etc. I.e. how do you maintain the price & dividend files so you can get total return between any two dates?

Any input on above, or other "data model" tips for MySQL most appreciated!!!

Rick

Reply

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <pre> <img> <div> <quote>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
7 + 1 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.