Saturday, March 22, 2014

Data Warehouse Interview Questions

Data Warehouse Interview Questions

Tuesday, February 11, 2014

Mahout - Future Directions

Introduction

The Apache Mahout Machine Learning Library’s goal is to build scalable Machine Learning libraries. Mahout’s focus is primarily in the areas of Collaborative Filtering (Recommenders), Clustering and Classification (known as the "3Cs"), as well as the necessary infrastructure to support those implementations. That would include, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and more.
Future Releases

Saturday, February 1, 2014

Spoilt for Choice – How to choose the right Big Data / Hadoop Platform?

Big data becomes a relevant topic in many companies this year. Although there is no standard definition of the term „big data“, Hadoop is the de facto standard for processing big data. Almost all big software vendors such as IBM, Oracle, SAP, or even Microsoft use it. However, when you have decided to use Hadoop, the first question is how to start and which product to choose for your big data processes. Several alternatives exist for installing a version of Hadoop and realizing big data processes. This article discusses different alternatives and recommends when to use which one.

Using Apache Storm for real-time analytics at Rocket Lawyer.

With today’s data technologies, storing data and scaling the infrastructure is becoming a non-issue with HDFS, Hadoop, and related architectures. Hadoop provides the batch-processing framework with MapReduce for processing the data. However, batch processing poses challenges with high data read latency for use cases like real-time analytics, clickstream visualization, and machine learning. We needed a real-time system to process our customer and system generated data as it happens to make important and quick business decisions. At Rocket Lawyer, we have chosen Apache Storm to supplement our data platform with real-time processing capabilities.

Hadoop related projects and frameworks

Big-data” is one of the most inflated buzzword of the last years. Technologies born to handle huge datasets and overcome limits of previous products are gaining popularity outside the research environment. The following list would be a reference of this world. It’s still incomplete and always will be.

Wednesday, August 28, 2013

50 Open Source Replacements for Proprietary Business Intelligence Software.

In a recent Gartner survey, CIOs picked business intelligence and analytics as their top technology priority for 2012. The market research firm predicts that enterprises will spend more than $12 billion on business intelligence (BI), analytics and performance management software this year alone.

As the market for business intelligence solutions continues to grow, the open source community is responding with a growing number of applications designed to help companies store and analyze key business data. In fact, many of the best tools in the field are available under an open source license. And enterprises that need commercial support or other services will find many options available.

This month, we've put together a list of 50 of the top open source business intelligence tools that can replace proprietary solutions. It includes complete business intelligence platforms, data warehouses and databases, data mining and reporting tools, ERP suites with built-in BI capabilities and even spreadsheets. If we've overlooked any tools that you feel should be on the list, please feel free to note them in the comments section below.

Free BI Tools With Commercial Options

Business intelligence vendors are companies that provide data mining, data warehousing, and enterprise resource planning services. Some of the top vendors that offer free BI reporting tools and commercial services are: MicroStrategy, QlikView, Pentaho, JasperReports Library, Rapid-I, and Jedox.