Perhaps every system administrator/DBA talks about this issue...and this recommendation is based on the common sense: Make valid backups of your data.
We made backups at all level: operating system, user file system directories and the database. Because we use MyISAM and ARCHIVE engines we simply flush the table we want to backup and then copy the files to the tape.
The backup must be an automated labor that must run at different level (full, incremental and differential), and there are many good open source tools for this: we use Bacula with great results.
Why I am talking about this? this last week I must drop a temporary table with so bad luck that I dropped the wrong table (an was an aggregated historic table). Those are the moments when you say...thank goodness I have a backup!
viernes, 29 de febrero de 2008
sábado, 23 de febrero de 2008
Learning to dump: The hard a way
Remember the problem with ARCHIVE engine in the upgrade from 5.0 to 5.1?
Well...I had to mount a parallel MySQL 5 server and restore those tables from the backup. Now the trick is to do a mysqldump...the thing is...what would be the best way to dump the tables (as the avg number of rows is around 20millon)?
The first approach I tried: a bash script that goes table to table doing:
-mysqldump thedb thetable |mysql -htheotherserver
Problems? The dump is going very slow....moreover I learn something the hard way: Mysqldump by default load all the table in memory before dump it....the result? somewhen in the morning the server ran out of memory and crashed......
This week that comes I will try other approaches to the problem....
Well...I had to mount a parallel MySQL 5 server and restore those tables from the backup. Now the trick is to do a mysqldump...the thing is...what would be the best way to dump the tables (as the avg number of rows is around 20millon)?
The first approach I tried: a bash script that goes table to table doing:
-mysqldump thedb thetable |mysql -htheotherserver
Problems? The dump is going very slow....moreover I learn something the hard way: Mysqldump by default load all the table in memory before dump it....the result? somewhen in the morning the server ran out of memory and crashed......
This week that comes I will try other approaches to the problem....
miércoles, 20 de febrero de 2008
Migrating the platform to mysql 5.1
Last week I began the actualization of the server platform (CentOS versión, MySQL version, GFS version) for the dwh I manage.
Everything went quite right except the MySQL upgrade (from 5.0 to 5.1). As you may know MyISAM tables are composed by three files: a frm file that contains information of the table structure, a myd file that contains the data and a myi file that contains the index data. The MyISAM structure didn't change from 5.0 to 5.1 so I just did a copy of the files....but that was not totally true for ARCHIVE engine...
The ARCHIVE engine is a compressed data table which allow less disk usage (you should know disk usage is important for DWH :)) ...the scans are fater because rows are compressed and the server can read more rows from disk (obviously with the penalty of processor time for decompression). I use ARCHIVE for historic and less frequently queried tables...
So what was the problem? MySQL 5.1 crashes on some (not all) ARCHIVE tables....I am working around this problem.....
Everything went quite right except the MySQL upgrade (from 5.0 to 5.1). As you may know MyISAM tables are composed by three files: a frm file that contains information of the table structure, a myd file that contains the data and a myi file that contains the index data. The MyISAM structure didn't change from 5.0 to 5.1 so I just did a copy of the files....but that was not totally true for ARCHIVE engine...
The ARCHIVE engine is a compressed data table which allow less disk usage (you should know disk usage is important for DWH :)) ...the scans are fater because rows are compressed and the server can read more rows from disk (obviously with the penalty of processor time for decompression). I use ARCHIVE for historic and less frequently queried tables...
So what was the problem? MySQL 5.1 crashes on some (not all) ARCHIVE tables....I am working around this problem.....
jueves, 14 de febrero de 2008
Mondrian at last
Finally I did something in production using Mondrian. For those who don't know what it is, Mondrian is an open source rolap server written in Java. What is the purpose of OLAP? analyze information or measures across different dimensions. So, if you want to use Mondrian, you should follow the dimensional modeling principles in your db design and then build a XML document that maps dimensions and fact tables into the relational entity model. Here you can see some screenshots of Jpivot+Mondrian (Jpivot is a web UI for mondrian)
I did a couple of cubes in order to calculate the Payback of the BTS (base stations) of the mobile network. The payback is the capex of the bts divided by the cash flow that the station generates. I have two cubes one for traffic and revenue and one for the capex costs, also I have three dimensions: geographic, date and type of coverage. With those two cubes I made a virtual cube for the payback calculated measure.
On the top of the cubes you write the MDX (kinda SQL for Multidimensional analysis) for each report. MDX imho is an awesome expression language that lets you write many differents analysis,
The user's response to this app was great althought Jpivot is a little bit oldie...Anyway if you plan to build a data warehouse using open source software and you'd like to have this nice feature read more deeply about Mondrian.
I did a couple of cubes in order to calculate the Payback of the BTS (base stations) of the mobile network. The payback is the capex of the bts divided by the cash flow that the station generates. I have two cubes one for traffic and revenue and one for the capex costs, also I have three dimensions: geographic, date and type of coverage. With those two cubes I made a virtual cube for the payback calculated measure.
On the top of the cubes you write the MDX (kinda SQL for Multidimensional analysis) for each report. MDX imho is an awesome expression language that lets you write many differents analysis,
The user's response to this app was great althought Jpivot is a little bit oldie...Anyway if you plan to build a data warehouse using open source software and you'd like to have this nice feature read more deeply about Mondrian.
domingo, 10 de febrero de 2008
The guide to squash
Since I moved out to my new apartment I began to learn and practice squash. I discover squash is a very entertain and fully workout game. While my objective is not to be a pro I want to have a good level. What am I doing to improve my game? At the beginning I took some squash lessons. After that I watched the videos and tips from a web site I recommend: The Guide-To-Squash
Here you can find beginner and intermediate drills, moves, hits, tactics, etc. Also you can find game videos from top squash players. Is a pity the creators didn't complete the advanced section.
I recommend to practice the hits and drills two or three times in the week.
Hope you enjoy the link.
Here you can find beginner and intermediate drills, moves, hits, tactics, etc. Also you can find game videos from top squash players. Is a pity the creators didn't complete the advanced section.
I recommend to practice the hits and drills two or three times in the week.
Hope you enjoy the link.
viernes, 8 de febrero de 2008
The first post
Ok, this is my first post...this blog is intended to write about reporting and data warehousing with open source (primarily Pentaho and Mysql), but I also will write about stocks analysis and maybe squash...although my native language is spanish I decided to write this blog in english to practice....if you want me to write a post regarding the topics above in spanish just contact me.
Suscribirse a:
Entradas (Atom)