So today we want to have some knowledge regarding basic understanding on evolve of apache hadoop.
I am starting this post with great thanks to Mr. Doug Cutting. for his wonderful work towards making such a nice big data processing framework.
Today industry is growing with big to bigger dataset.
What that mean???
Industry is having data in electronic form for last 10-15-20 years.They are storing it in server to bigger server to even bigger server for years.
But the processing and storing capacity for each machine is depreciated as the years go(even for server machines). I do not know, but as far my view, a server with 4GB ram and dual core processor which was used to be a server machine for a organisation 5-7 years back, now become a desktop machine for people.
With invention from every day new market entrant, we are now-a-days finding newer gadgets for almost every 2/3 days (as per my view).
Now what the gadgets are for now?
They are now more and more socially featured (our special thanks to google talk, gmail, facebook, twitter and many more with inline). So allmost every gadget are now enabled with social media features. Tomorrow will be more.
Also, thanks to marketing guys, that people now believe engaging and sharing themselves with every moments of life, is the most happening thing now.
So here the point is –
The data volume with every dimension- official, unofficial, structured, non-structured, sense-able, nonsense – in nature are increasing.
And management gurus are expecting more and more refined information from this data to make their strategic decision like branding, promotion, budgeting, research & development etc correct and make their investors happy at the end of day.
So more and more requirement of complex and refined reports are growing.
So the hardware companies are benefiting by supplying monster configuration servers and also the same case for data analytic – consultant firms with out-of-the-box business intellegence software installation and customisation.
But these are mostly for structured data.
Unfortunately most useful information are hidden in unstructured data (world wide 80% of e-formed data are un-structured.)
Example – We can have the sale data of canon camera information from structured database – a centrally processed RDBMS.
But from conversation from social media, we can have no of users who have used a new brand of canon camera and they found some exceptional feature of this camera which is worthy to them. So here may be in ref feral process, in next qtr that camera may have more market share than others. And so to derive some strategic decision….
Here we have a real need of intensive data processing from all the available input source and some programming part to achieve the result. BI solutions are unfortunately can not be so dynamic that every complex business requirement can be solved with that.
Also now management realises that power of single or a few servers are not enough to process these. End of the day, there is a costing factor to get the macro data as processed and refined information.
So some of them had realised this earlier. And so the tech persons….
Now to achieve the goal to process the structured and unstructured data in a fail-safe and cost effective way the tech world came here with apache hadoop stack.
And here ends the questions like what is hadoop and why hadoop….
In a general way, hadoop runs on any linux machine (I have started to play with it in ubuntu) and the interesting thing is hadoop runs on any commodity machine (big love for big data solutions from management sides of organisation)
Yes…To mention to process the program, hadoop runs map reduce framework – a two phase programming practice, in first phase (map) it process the data and in second phase(reduce) it accumulate and compute the data for statistics generation.
But all are not very smooth….some skills are required here….quality gray matters….again a gray area…ha ha…
Now end of joke and I am serious and it seems that you are …
Read about hadoop stack – hadoop, hbase, hive, pig etc from the world of google…On next post on hadoop I will discuss the hadoop architecture…
And keep commenting….