miércoles, 19 de marzo de 2014

Pentaho for Big Data Analytics

Despite I'm currently not an active community user of Pentaho, mainly because right now I'm focused on network and it managament, I still follow the evolution of this great platform. In the last years there have been great new improvements:

 - The acquisition of Webdetails with all the useful Ctools
- The integration of connectors to several NoSQL technologies allowing the use of big data in all the components of the platform (BI Server, Kettle, Mondrian, etc).

 Recently I got the opportunity to review the Pentaho for Big Data Analytics book published by Packt. My expectations on the book were quite high. I hoped the book would help me to stay updated with the latest improvements of Pentaho and clarify a lot of the marketing buzzword around Big Data.

What I found was the following:

 - The book intention was to give a broad overview of Pentaho components and spend a lot of chapters setting up Pentaho platform: One would expect that if someone buy this book is because he already have a background of Pentaho and want to detail the relationship of it with Big Data.
 - The Bigdata theory and examples were oriented to using Apache Hive: It is explained how to handle files in HDFS and how to handle Big Data analysis through Apache Hive (which at the end is a SQL layer over Hadoop). While the examples and theory are a good introduction to the topic, there a lot of issues not handled: How to do a map/reduce job directly to Hadoop? What about other NoSQL technologies like MongoDB, Reddis, etc?
 - There are very good examples about how to use Ctools: Handling the CDE, CDF and CDA tools is not easy at the beginning, so the "Visualization of Big Data" chapter is very helpful for this.

I think the book is worth of read if you are a new user of Pentaho and Hadoop, want a introduction about how to install, run them, etc, and need the first steps to handle data through Hadoop.

miércoles, 11 de abril de 2012

Software Engineering for Software as a Service - Statement

If you want to have an idea of how is the statement here is a picture of it:

Software Engineering for Software as a Service

Hi,
I had the experience to take the Software Engineering for Software as a Service course offered by professor Armando Fox and David Patterson from University of Berkeley through the coursera startup.

The course used the same material, quizzes and videos from the official Berkeley class. And at then end deliver an statement of accomplishment. I'll try to upload it later.

In general terms I think the experience was great and I am very grateful with Professors Fox and Patterson. There will be a new offering of the course if you are interested. The url of the course is http://saas-class.org

The topics that the course covers are Architectural patterns, software design, code coverage, unit and integration tests, agile development and Ruby. If you like those topics you'll have a great time.

I have to say I liked the Ruby language and the rails framework. In a non expert and scientific perception I think it is more suitable for most of the projects than Java in terms of speed and velocity of development and value added to enterprise software.

jueves, 21 de julio de 2011

Tutorial CDE

Inteligencia de Negocio y Pentaho: Cómo hacer cuadros de mando: V: Un tutorial recomendado para manejar las excelentes herramientas que Pedro Alves y Webdetails han desarrollado, en especial CDE.

lunes, 18 de julio de 2011

Your faithful employee

Hello,

In the area I work for, we have one duty among several others more: Maximize the availability of the network management systems. The NMSs must run almost every time because the NOCs (network operation centers) monitor them 7x24.

But thats not an easy task, sometimes the planning area delivers the NMS implementation with many flaws, other times the machine you get is not what you expected or the application servers freezes continously and research the root cause can take weeks.

As we're not big fan of attending service disruption calls at 3am, we deployed a nice and useful service in all our Linux machines: Monit. This nice program monitor the service existence,availability and performance and take automated actions when the rules/thresholds are exceeded.

For example, we had a Tomcat container that was getting frozen several times at the week...the rule for Monit was something like this:

check process tomcat5 with pidfile /var/run/tomcat5.pid
group tomcat5
start program = "/etc/init.d/tomcat5 start" with timeout 120 seconds
stop  program = "/etc/init.d/tomcat5 stop" with timeout 120 seconds
if failed host 127.0.0.1 port 8080 
protocol HTTP request /archivos/gestion.jpg 
TIMEOUT 3 SECONDS then restart
if cpu usage > 95% for 10 cycles then restart
if 5 restarts within 5 cycles then timeout


So, this loyalty automated employee check the port 8080, check the HTTP request of gestion.jpg to be less than 3 seconds and check the cpu usage of the process to be under 95%. If Monit sees any of these rules broken then it begins to restart the service. As the good employee he is, Monit sends email notification of every step he takes.

Hope this application be useful for you,


Quick update: Even thought Monit is doing a great job is important to find the root cause. Regarding the tomcat issue i found this useful site to tune the JVM parameters: http://wiki.alfresco.com/wiki/JVM_Tuning

miércoles, 30 de marzo de 2011

pentaho CDE CDF CDA CD* tutorials

Quick post to help people trying to make stuff on those great tools developed by webdetails.pt . They are great but have a lack of docs, so this two links would help:

http://www.vinzi.nl/media/CDE-introduction_V0_8.pdf
http://www.tikalk.com/incubator/blog/creating-bugzilla-dashboard-%E2%80%93-hands-cde-tutorial-%E2%80%93-fuse-day-3-session-summary

I got those links in IRC ##pentaho channel on freenode btw

PD: If you want to install CDE in a context path different to /pentaho/, keep in mind that uri is hardcoded in all js and templates files of the pentaho-cde-dd directory

jueves, 10 de marzo de 2011

Messing with Roo

In the last Google I/O was anounced that GWT 2 will have support in Spring Roo. I never heard of Roo before and I wanted to take look (http://www.springsource.org/roo).

It seems that Roo improves the development by maintaining all the interfaces, stubs and glue code that is boring when you use GWT or Entities.

I just ran two of the demos...a classic Spring Controller Roo Project with handling the CRUD of a Concat entity (http://s3.springsource.com/MRKT/roo/2010-01-Five_Minutes_Roo.mov):

And a GWT integration handling the CRUD of an equipment (http://www.thescreencast.com/2010/05/how-to-gwt-roo.html)


So far  it was pretty impressive to make an entity based application in 5 minutes without all the glue code, but as this is a new concept I wonder if anyone has done any real large sized project supported in Roo....