Big data becomes a relevant topic in many companies this year. Although there is no standard definition of the term „big data“, Hadoop is the de facto standard for processing big data. Almost all big software vendors such as IBM, Oracle, SAP, or even Microsoft use it. However, when you have decided to use Hadoop, the first question is how to start and which product to choose for your big data processes. Several alternatives exist for installing a version of Hadoop and realizing big data processes. This article discusses different alternatives and recommends when to use which one.
Showing posts with label Big Data. Show all posts
Showing posts with label Big Data. Show all posts
Saturday, February 1, 2014
Hadoop related projects and frameworks
“Big-data” is one of the most inflated buzzword of the last years. Technologies born to handle huge datasets and overcome limits of previous products are gaining popularity outside the research environment. The following list would be a reference of this world. It’s still incomplete and always will be.
Saturday, May 25, 2013
SQL is what’s next for Hadoop: Here’s who’s doing it.
SUMMARY:
More and more companies and open source projects are trying to let users run SQL queries from inside Hadoop itself. Here’s a list of what’s available and, on a high level, how they work.
Thursday, May 23, 2013
SQL, NoSQL, BigData in Data Architecture
All about how to build "Data Architecture" using SQL, NoSQL and BigData technologies and how to evaluate them.
Predictive Analytics is a Goldmine for Startups.
Traditional business intelligence (and data mining) software does a very good job of showing you where you’ve been. By contrast, predictive analytics uses data patterns to make forward-looking predictions that guide you to where you should go next. This is a whole new world for startups seeking enterprise application opportunities, as well social media trend challenges.Wednesday, May 15, 2013
The Hadoop Distributed File System.
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 40 petabytes of enterprise data at Yahoo!
Saturday, May 4, 2013
24 Interview Questions & Answers for Hadoop MapReduce developers
A good understanding of Hadoop Architecture is required to understand and leverage the power of Hadoop. Here are few important practical questions which can be asked to a Senior Experienced Hadoop Developer in an interview. This list primarily includes questions related to Hadoop Architecture, MapReduce, Hadoop API and Hadoop Distributed File System (HDFS).
Big Data Top Questions by Marketers and their Kids Infographic 2013
What are the top questions marketers ask about their Big Data and how are they similar to their kids’ questions? Here’s a tongue-in-cheek look at how their questions are similar. See the below infographic from Infochimps via Visual.ly.
Tuesday, April 30, 2013
Big Data Analytics with Hadoop
A good presentation, it is helpfull from level of beginers to advance...
Monday, April 29, 2013
Large-Scale Processing in Netezza.
Transitioning from ETL to ELT
CIO: Why is that uber-powered [commodity RDBMS] system running out of steam? Didn’t we just upgrade?
MANAGER: Yes, but the upgrade didn’t take.
CIO: Didn’t take? Sounds like a doctor transplanting an organ. Do you mean the CPUs rejected it? (laughing)
MANAGER: (soberly) No, just the users. Still too slow.
CIO: That hardware plant cost us [X] million dollars and it had better get it done or I’ll dismantle it for parts. I might dismantle your prima-donna architects with it!
Sunday, April 28, 2013
Installing Hadoop on Ubuntu (12.04) - single node
--Installing Java
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
--Creating user
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
Intro to Hadoop.
What is hadoop
Data is growing exponentially. What’s not so clear is how to unlock the value it holds. Hadoop is the answer. Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. Hadoop is written in the Java programming language. Hadoop was derived from Google's Map Reduce and Google File System (GFS) papers.
Google’s MapReduce provides:
- Automatic parallelization and distribution
- Fault-tolerance
- I/O scheduling
- Status and monitoring
Wednesday, April 24, 2013
Integrating Hadoop into Business Intelligence and Data Warehousing: An Overview in 27 Tweets.
To help you better understand how Hadoop can be integrated into business intelligence (BE) and data warehousing (DW) and why you should care, I’d like to share with you the series of 27 tweets I recently issued on the topic. I think you’ll find the tweets interesting, because they provide an overview of these issues and best practices in a form that’s compact, yet amazingly comprehensive.
Every tweet I wrote was a short sound bite or stat bite drawn from my recent TDWI report “Integrating Hadoop in Business Intelligence and Data Warehousing.” Many of the tweets focus on a statistic cited in the report, while other tweets are definitions stated in the report.
Every tweet I wrote was a short sound bite or stat bite drawn from my recent TDWI report “Integrating Hadoop in Business Intelligence and Data Warehousing.” Many of the tweets focus on a statistic cited in the report, while other tweets are definitions stated in the report.
Monday, April 22, 2013
Hadoop Interview Question
1.What is Hadoop
framework?
Answer:
Hadoop is a open
source framework which is written in java by apache software foundation. This
framework is used to write software application which requires to process vast
amount of data (It could handle multi tera bytes of data). It works
in-parallel on large clusters which could have 1000 of computers (Nodes) on the
clusters. It also process data very reliably and fault-tolerant manner.
2.On What concept the Hadoop framework works?
Answer:
It works on MapReduce,
and it is devised by the Google.
3.What is MapReduce ?
Understanding Hadoop Clusters and the Network
This article is Part 1 in series that will take a closer look at the architecture and methods of a Hadoop cluster, and how it relates to the network and server infrastructure. The content presented here is largely based on academic work and conversations I’ve had with customers running real production clusters. If you run production Hadoop clusters in your data center, I’m hoping you’ll provide your valuable insight in the comments below. Subsequent articles to this will cover the server and network architecture options in closer detail.
How To Build Optimal Hadoop Cluster ( Hadoop recommendations)
Preface
Amount of data stored in database/files is growing every day, using this fact there become a need to build cheaper, mainatenable and scalable environments capable of storing big amounts of data („Big Data“). Conventional RDBMS systems became too expensive and not scalable based on today’s needs, so it is time to use/develop new techinques that will be able to satisfy our needs.
One of the technologies that lead in these directions is Cloud computing. There are different implementation of Cloud computing but we selected Hadoop – MapReduce framework with Apache licence based on Google Map Reduce framework.
Sunday, April 21, 2013
Making Big Data and BI Work Together
For enterprise IT and the end-users it supports, the interplay between big data and B.I. can prove as exciting as it is frustrating.
As enterprise executives and end-users eagerly look to gain meaningful intelligence and fast time-to-insight from deep wells of rich data—enabling them to react more quickly and intelligently to market conditions, deliver better customer service, streamline internal operations, and differentiate the organization from among the competition—IT is charged with facilitating such desires for agility even as rivers of data continue to pour into the organization.
With storage costs low enough to easily and cost-effectively store vast amounts of data, many IT organizations opt to store virtually everything they can. While that satiates some of the desires demanded by end-users, it increases the pressure on the makers of B.I. tools to create offerings robust enough to make meaningful, quick, and accurate sense of all available data.
Subscribe to:
Posts (Atom)