Apache Spark

What 2016 holds for Machine Learning?

Nitin Sinha

Friday, 29 January 2016

Technology

The evolution of Machine Learning (ML) is affected by the approach of the tech giants towards it. Open Source Platforms and the data sources also have an important impact on the ML models. Tech giants have realized the importance of ML, and this is becoming the new normal for them. They are now focusing on providing ML models as a Service. These are built for the common usage, not just for the data scientists. Most of the softwares being used for ML are open sources, thus affecting the market of other softwares making sources. Tools like Apache Spark are going to dominate the market. Read more about it on: http://www.infoworld.com/article/3017251/data-science/what-machine-learning-will-gain-in-2016.html

Tags:

machine-learning Apache Spark Machine Learning models

4332 Hits

0 Comments

Difference between Hadoop and Apache Spark

Nitin Sinha

Monday, 25 January 2016

Analytics Technology

Hadoop and Apache Spark are seen as the competitors in the world of big data, but now the growing consensus is that they are better convention in together. Here is a brief look at what they do and how they are compared. 1. They do different things: Both are the big-data frameworks, but they do not serve the same purposes. Hadoop is a distributed data infrastructure. It also Indexes and keep track of that data, enabling big-data processing and analytics. On the other hand, Spark is a data processing tool. Secondly, both can be used individually, without the other. 3. Spark is faster 4. You may not need Spark's speed: Spark is fit for real-time marketing campaigns, online product recommendations, cybersecurity analytics and machine log monitoring. 5. Failure recovery: differently, but still good. Read more at: http://www.computerworld.com/article/3014516/big-data/5-things-to-know-about-hadoop-v-apache-spark.html

Tags:

Cloud Computing Data analytics Big Data Apache Spark Hadoop

5535 Hits

0 Comments

Choosing a Hadoop Distribution

Nitin Sinha

Tuesday, 14 July 2015

Technology

Choosing the right Hadoop distribution can be a tricky process. There are 4 basic categories that businesses should look at for specific qualifying criteria.
1. Performance
Hadoop is widely chosen as a data platform due to its high performance achieved by replacing the stock MapReduce by Apache Spark. However not all operations need such superior hardware and a business must choose its hardware on basis of the operations it hopes to perform.
2. Dependability
When looking for a distribution, dependability is a significant but rare feature. Only few implementations in Hadoop can guarantee a system availability of 99.999%. Look for a distribution that provides Self-Healing, No Downtime Upon Failure, Tolerance of Multiple Failure, 100% Commodity Hardware, No Additional Hardware Requirements, Ease of Use, Data Protection and Disaster Recovery.
3. Manageability
Look for a distribution that has intuitive administrative tools that assist in management, troubleshooting, job placement and monitoring.
4. Data Access
Gathering and storing data is just the beginning of the process. What really matters is that the stored data must me easily accessible for further processing. Look for a distribution that provides
• Full access to the Hadoop file system API
• Full POSIX read/write/update access to files
• Direct developer control over key resources
• Secure, enterprise grade search
• Comprehensive data access tooling
Hopefully these four specification along with your criterions will enable you to choose the best Hadoop distribution for you.

For more information visit:
http://www.smartdatacollective.com/davemendle/324791/four-considerations-when-choosing-hadoop-distribution

Tags:

Hadoop MapReduce Apache Spark analytics business intelligence

4941 Hits

0 Comments

SigmaWay Blog

What 2016 holds for Machine Learning?

Difference between Hadoop and Apache Spark

Choosing a Hadoop Distribution

About Sigmaway

Our Services

Other

Contacts