Friday, June 13, 2014

Random Pentaho (PDI) Stuff - I


Pentaho PDI 5.0
  1. Transformation components - Merge Join & Group By require sorted data. Use Sort rows component to sort the data. Alternatively if any of these components are used immediately after table input, then sort the data in table input query.
  2. Multiway Merge join - As of version 5.0, this component doesn't work when connected to repository.
  3. Spoon.bat - Edit this file to increase java memory. Default configuration : PENTAHO_DI_JAVA_OPTIONS="-Xmx512m" "-XX:MaxPermSize=256m" 
You can increase the highlighted values as per system configurations.


Saturday, December 25, 2010

SSAS Storage Modes

In SQL Server Analysis Services 2008, there are three storage mode options available: Relational Online Analytical Processing (ROLAP), Multidimensional Online Analytical Processing (MOLAP) and Hybrid Online Analytical Processing (HOLAP).

People generally tend to ask hwat are the various advantages and disadvantages of these storage modes.
In this post I'll try to provide a quick overview describing the storage modes and laying out some of the Advantages and Disadvantages of each.

ROLAP - Relational Online Analytical Processing

The ROLAP storage mode allows the detail data and aggregations to be stored in the relational database. If you plan on using ROLAP, you need to make sure that your database is carefully designed or you'll run into some bad performance issues.

Advantages:

•Since the data is kept in the relational database instead of on the OLAP server, you can view the data in almost real time.
•As data is in the relational database, it allows for much larger amounts of data, which can mean better scalability.
•Low latency.

Disadvantages:

•With all the data being stored in the relational database, query performance is going to be much slower than MOLAP.
•Maintaining a permanent connection with the relational database to use ROLAP.

MOLAP - Multidimensional Online Analytical Processing

MOLAP is the default and thus most frequently used storage mode. In MOLAP storage mode, the data and aggregations are stored in a multidimensional format, compressed and optimized for performance. This is both good and bad. When a cube with MOLAP storage is processed, the data is pulled from the relational database, the aggregations are performed, and the data is stored in the AS database.

Advantages:

•Since the data is stored on the OLAP server in optimized format, queries, complex calculations are faster than ROLAP.
•The data is compressed so it takes up less space.
•And because the data is stored on the OLAP server, you don't need to keep the connection to the relational database.
•Cube browsing is fastest using MOLAP.

Disadvantages:

•Because you don't have a real time connection to the relational database, you need to frequently process the cube to update your data.
•If there's more data, processing is going to take longer.
•There's also an additional amount of storage since a copy of the relational database is kept on the OLAP server.
•High latency.

HOLAP - Hybrid Online Analytical Processing

HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type information, HOLAP leverages cube technology for faster performance. When detail information is needed, HOLAP can "drill through" from the cube into the underlying relational data.

Advantages:

•HOLAP is best used when large amounts of aggregations are queried often with little detail data, offering high performance and lower storage requirements.
•Cubes are smaller than MOLAP since the detail data is kept in the relational database.
•Processing time is less than MOLAP since only aggregations are stored in multidimensional format.
•Low latency since processing takes place when changes occur and detail data is kept in the relational database.

Disadvantages:

•Query performance can head downhill fast when more detail data is queried from the relational database.