R.SETHURAMAN

ASSISTANT PROFESSOR & TECHNOLOGIST

An Efficient Framework for Data As A Service in Hadoop EcoSystem

By srssethuraman | March 2, 2014 | 0 Comment

  • Introduction

– Big Data Analytics

– Hadoop EcoSystem

– Data As a Service

  • Literature Survey
  • Inference from the Survey
  • Problem Defined
  • Key Challenges of Problem
  • Proposed Methodology
  • References

Introduction

Big Data Analytics:

  • A process of examining large data sets containing various of data types to uncover hidden patterns, unknown correlations and other useful business insights
  • The sources for the large data sets includes server logs, social media, mobile devices and sensors. These data’s are of unstructured and semi-structured type.
  • The traditional databases and Relational databases will not fit these unstructured and semi-structured data obtained from data sources
  • This makes an necessity for the move to the new technology of Hadoop.
  • Hadoop is an framework that supports the processing of huge and diversed data sets across clustered systems
  • Hadoop does with support of related tools like YARN, MapReduce, Hive…
  • This serves as an central repository for all incoming streams of raw data.
  • Hadoop is not a single product instead its an collection of components.
  • Its popularity is in storing, analyzing and in fast retrieval of unstructured data in low cost effective manner.

Hadoop EcoSystem

  • Hadoop is an framework that supports the processing of huge and diversed data sets across clustered systems
  • Hadoop does with support of related tools like YARN, MapReduce, Hive…
  • This serves as an central repository for all incoming streams of raw data.
  • Hadoop is not a single product instead its an collection of components.
  • Its popularity is in storing, analyzing and in fast retrieval of unstructured data in low cost effective manner.

 

Data As A Service [DaaS

  • Data as a service (DaaS) is the delivery of statistical analysis tools or information obtained from large information sets in order to gain a competitive advantage for an organization.
  • This is done over the immense volume of unstructured data that was updated in the regular basis

HOW IT WORKS:

– the data’s obtained using web crawlers are sent into framework of Hadoop for the following processing

* Data Storage

* Data Processing

* Data Management

Problem Defined

  • Data retrieval can be made effective for Unstructured and semistructured data by using Machine Learning Algorithms like page ranking and C4.5
  • The process of Normalization can be improved with the implementation of text processing
  • Record linkage done through efficient mining algorithms for heterogenous data

PROPOSED FRAMEWORK

big

Proposed Methodology

  • A new framework is proposed for to achieve the efficient DaaS using machine learning Algorithms and Text processing.
  • The machine learning algorithm C4.5 helps in building Decision Trees
  • The equivalent to C4.5 is CART.
  • Page Ranking helps in basic graph analysis
  • The graphs are connected with each other.

List of References

  • How Treato Analyzes Health-related Social Media Big Data with Hadoop and HBase _ Cloudera Engineering

[Assaf Yardeni,International Conference on Cloud, Big Data and Trust 2013, Nov 13-15, RGPV]

  • Algorithm and Approaches to handle large Data- A Survey

[Chanchal Yadav, Shullang Wang, Manoj Kumar,”, IJCSN, Vol 2, Issuue 3, 2013 ISSN:2277-5420

  • Managing Heterogeneous Sensor Data on a Big Data Platform IoT Services for Data-intensive Science (Koji Zettsu, Takashi Kimata[Computer Software and Applications Conference Workshops (COMPSACW), 2014 IEEE 38th International]
  • Performance and energy efficiency of big data applications in cloud environments A Hadoop case study(Eugen Feller, Lavanya Ramakrishnan, Christine Morin IJCSN” Volume 74, Issue 3, March 2014, Pages 2166–2179”)
  • Service-generated Big Data and Big Data-as-a-Service Overview

(Zibin Zheng, Jieming Zhu, and Michael R. Lyu university of HongKong, china[2014 IEEE International Congress ])

  • Towards Cloud-based Analytics-as-a-Service (CLAaaS)for Big Data Analytics in the Cloud

        (Farhana Zulkemine, Michael Bauer, Ashraf Aboulnaga Queens University, canada[IJCSN, Vol 2, Issuue 3, 2014 ISSN:2277-5420])

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *