SigmaWay Blog

SigmaWay Blog tries to aggregate original and third party content for the site users. It caters to articles on Process Improvement, Lean Six Sigma, Analytics, Market Intelligence, Training ,IT Services and industries which SigmaWay caters to

This sections contains articles submitted by site users and articles imported from other sites on analytics

Adopting Business Intelligence Tools in SMEs

Often small and medium sized organizations are of the opinion that Business Intelligence tools are too complex and expensive to be implemented that they can survive without them.But the truth is they can get using these tools put you in a competitive advantage over others.If SMEs can manage and harness data they can analyze it to increase their revenue and figure out what is holding them back. SMEs looking forward to establish BI and corporate performance solutions must think about users or consumers of their data and where the data sources are located.Make sure your solution is compatible with all mobile devices.No matter how small your business is , you can always benefit from BI tool.Read more at : http://www.datavizualization.com/blog/bi-tools-for-smes-not-just-maybe-but-definitely

 

  3474 Hits

Common Mistakes in Risk Management : Big Data Analytics

Big Data is the Buzzword of 21st century as we know it and has been extremely useful in several risk assessment tasks. The effectiveness of Big data on risk management depends on accuracy,consistency ,completeness and timeliness of data. Some most common mistakes made by Big Data experts who are involved in risk management are : Confirmation Bias : It occurs when data scientists use limited data to prove their hypothesis.

Selection Bias : When data is selected subjectively, Analyst comes up with the questions and thus almost picking the data that is going to be received ( Ex : Surveys) 

Outliers : Outliers are often interpreted as normal data

Simpson’s Paradox : When group of data points to one trend, but can reverse when they are combined

Confounding Variables are overlooked

Continue reading
  4230 Hits

Most Common Myths about Stream Data Processing

Data Science experts spend lots of time solving problems using streaming data processing. There are many misconceptions about modern stream process space . Here are few of them There's no streaming without batch :  These limitations existed in earlier version of Apache Storm and are no more relevant in modern stream processing architectures such as Flink. Latency and Throughput: Choose One : A good engineer software like Flink is capable of low latency and high throughput. It has been shown to handle 10s of millions of events per second in a 10-node cluster. Micro-batching means better throughput : Though streaming framework will not rely on batch processing, but it will buffer at the physical level. Exactly once? Completely impossible: Flink is able to provide exactly one state which guarantees under failure by reading both input stream position and the corresponding state of the operator. Earlier traditional data flow had to be interrupted and stored in applications to interact, but new patterns such as CQRS can be developed on continuously flowing data. As the stream processing further evolves we will have more power computational models. You can read more at : http://dataconomy.com/2017/02/stream-processing-myths-debunked/

  4927 Hits

Predictive Social Media Analytics

Social networks have been there in some or the other form since the time humans have started interacting. Social network Theory is the study how people, organizations or groups interact within their networks. To create a network using Twitter trending topic to define each city as a vertex, If there is at least one common trend topic between two cities, there is an edge and each edge is weighted according to the number of trendy topics. Network topology doesn't usually change in such scenario as the number of nodes is fixed few metrics that could be used to infer the node's importance and which could explain the type of predictive analysis are Node centrality, Clustering coefficient and Degree centrality. Social media analysis holds a great potential as the they are becoming more huge and complex each day. Read more at: http://dataconomy.com/2017/01/data-mining-predictive-analytics/

  3845 Hits

5 Ws’ of Winning Data Strategy

According to a study, it was found that 78% enterprises agree that data strategy, collection and analysis have potential to fundamentally change the way their business operates. The sole aim of an effective data strategy is to utilize this potential . The 5 questions that one need to answer before building a data strategy are : WHAT is Data Strategy?: It is a strategy that allows you to have a comprehensive vision across the enterprise.

WHY do we need a Data Strategy? :You need a data strategy to find correlations across multiple disparate data sources, predict customer behavior, predicting product or service sales

WHEN should I start or have a Data Strategy?: Answer is NOW.

WHO in our organization should drive this Data Strategy?:Chief Data Officer

WHERE do we start with Data Strategy?:It depends on how the organization is structured , it’s recommended to start it in some business unit.

Continue reading
  4864 Hits

Three Stages of Big Data Collection Methodology

The word Big Data is connected with 4 Vs' Velocity, Volume, Variety, Veracity and each V plays a significant part in the Big Data world. The event that combines all these components, paints a clarified picture of what big data actually means. Big Data management methods adopted by many companies involve various stages: 1. Collecting Data: It includes accumulation of data from various information sources. 2. Store: It includes storing data in the appropriate database framework and server 3. Information Organization: It involves masterminding information on the premise of Organized, unstructured and semi-unstructured data. Read more at : http://www.bigdatanews.com/profiles/blogs/how-to-collect-big-data-big-data-a-new-digital-trend

  3391 Hits

Data Matching and Entity Identification at Scale

Data matching is the task of identifying, matching, and merging records that correspond to the same entities from several source systems. These entities may be people, places, publications or citations, consumer products, or businesses. The major hurdle that encounters while solving this problem is lack of common entity identifiers, easily available information like name, address, etc. that may change over time is usually of low quality and produce poor results with high error rate. Technological advancements in the last decade have made it possible to scale data, matching on large systems that contains millions of records and improved accuracy. You can read more at : http://www.datasciencecentral.com/profiles/blogs/data-matching-entity-identification-resolution-linkage

  3594 Hits

From Big Data to Small Data 

Big data refers to huge amount of structured and unstructured data collected from multiple sources and devices, Explosion of Internet of things is expected to connect 26 billion devices by 2020. There have always been two challenges : Organizing all information in a warehouse so that it can be fetched and processed efficiently . Second processing it in a way that it will provide meaningful results. It turns out only 58% company is understanding the value of their big data solutions. In contrast, small data address a specific problem in limited domains. It tends to focus on log analysis like user behavior on a website. A logging mechanism allows to capture specialized data for business teams and engineers without the need to dig into the ocean of big data. You can read more at : http://www.datasciencecentral.com/profiles/blogs/how-big-data-is-becoming-smaller-than-small-data

  3152 Hits

Accuracy-Interruptibility Trade off in Predictive Analytics

More accuracy is better, but it may not be a good idea to keep working on a model if you are expecting negligible improvement or cost of accuracy exceeds financial gain. The sole purpose of a data science job is to create financial value and minimize loss by building more accurate models. The guiding regulatory rules say say that if your model is having a negative impact on a customer then it must explain why an individual was so rated. This is a classic tradeoff between accuracy and interpretability. In a regulated industry if someone suffers from your decision and you can’t explain why the prediction model worked that way, your technique is not allowed. A good story telling using data visualization might help you to convince management. Some techniques like Penalized Regression, Generalized Additive Models, Quantile Regression can provide better accuracy and maintaining interpretability. Deep Neural Networks have also proven a successful approach to solve this problem.

You can read in more detail at : http://www.datasciencecentral.com/profiles/blogs/deep-learning-lets-regulated-industries-refocus-on-accuracy

 

  2861 Hits

Renovating Sales and Marketing Practices using B2B Ecosystem 

Managing Customer relations and increasing need of collaboration to build profitable business has led to the development of digital B2B ecosystem, which is a community of system working together to serve the needs of customers. These systems allow segmentation of audience and delivering a customized experience to each group. Some components of the B2B ecosystem are Enterprise Resource Planning System, Customer Relationship Management System, Product Information Management System, Order Management System, Marketing Automation System etc. A well-equipped system help marketers to Use Customer Insights to Cross-Sell, Optimize the Order and Reorder Processes, Better Manage Content ,Facilitate Lead Nurturing. In a well established B2B system Sales and Marketing collaborate to have a real time access to latest customer information. You can read in more detail at : http://www.datasciencecentral.com/profiles/blogs/how-b2b-ecosystems-big-data-can-transform-sales-and-marketing

  3960 Hits

The Art of Predictive Modelling 

Your perspective on data depends on the type of task you want to accomplish. They could be broadly specified as: Analytics : Helps you explore what happened and why.

Monitoring : Looking at things as they occur to find abnormalities.

Prediction : To predict what might happen in future.

Some of the most popular algorithms that can be applied to a predict future trends are :

The Ensemble Model : It uses multiple model output to arrive at a decision , however, one has to understand how to pick correct models and what problem does one want to solve.  

Continue reading
  3045 Hits

Data Value Chain for GeoSpatial Data

The value of data has changed over time. Companies have realized that collecting, analyzing, sharing, selling data and extracting actionable insights is critical to the development of their organization. Geospatial data is captured and analyzed by engineers and product managers to develop creative solutions and thus increasing productivity. People can view the flow of geospatial data from the instant it is collected throughout its lifecycle using a framework known as 'Data Value Chain'. Data intersects with analytics and can turn this information into decisions. A technological ecosystem built around a geospatial system provides new ways to work and reduce costs, accelerate schedules and supply high-value deliverables along the value chain. Read more at : http://dataconomy.com/2017/02/power-of-data-value-chain/

  4227 Hits

Data Science Challenges in Production Environment 

A very little time is spent on thinking about how to deploy a data science model into production. As a result, many companies fail to earn the value that comes from their efforts and investments. In production environment data continuously comes, result are computed and models are frequently trained. The challenges faced by companies fall into four categories:  Small Data Teams: They mostly use small data, often don’t retrain models and business team is involved in a development project. 

Packagers: Often build their framework from scratch and practice informal A/B testing , generally not involved with the business team

Industrialization Maniacs: These teams are IT led and automated process for deployment and maintenance , business team are not involved in monitoring and development

The Big Data Lab : Uses more complex technologies , business teams are involved before and after deployment of data product

Companies should understand that working in production is different than working with SQL databases in development , moreover real time learning and multi-language environments will make your process complex. Also a strong collaboration between business and IT teams will increase your efficiency. Read more at : http://dataconomy.com/2017/02/value-from-data-science-production/

Continue reading
  4623 Hits

Effective Quality Management using Hypothesis Test

A business hypothesis is a foundational theoretical concept whose good understanding helps you to achieve business goals. For instance, it provides a mathematical way to answer questions like whether you should spend on advertising or whether increasing a price of a product will affect your customers. Data collection is one part of the game, but correct data processing and interpretation is the final stage of your decision-making process. Hypothesis testing is used to infer whether there is enough data to support evidence . There are various test methods : Parametric Tests - z-test, t-test, f-test. Non Parametric Tests - Wilcoxon Rank-sum test, Kruskal-Wallis test and permutation test.

Read more at : http://www.datasciencecentral.com/profiles/blogs/importance-of-hypothesis-testing-in-quality-management

  3602 Hits

Good Statistical Practice

You can’t be a good data scientist unless you have a good hold on statistics and have a way around data. Here are some simple tips to be an effective data scientist:
Statistical Methods Should Enable Data to Answer Scientific Questions - Inexperienced data scientists tend to take for granted the link between data and scientific issues and hence often jump directly to a technique based on data structure rather than scientific goal.
Signals Always Come with Noise - Before working on data, it should be analysed and the actual usable data should be extracted from it.
Data Quality Matters - Many novice data scientists ignore this fact and tend to use any kind of data available to them, if always a good practice to set norms for quality of data.
Check Your Assumptions - The assumptions you make tend to affect your output equally as your data and hence you need to take special care while making any assumption as it will affect your whole model as well as results.
These are some of the things to keep in mind when working around with data. To know more you can read the full article by Vincent Granville athttp://www.datasciencecentral.com/profiles/blogs/ten-simple-rules-for-effective-statistical-practice

 

  3191 Hits

Recommenders : The Future of E-commerce

Recommender systems have become the backbone of the ecommerce sector. They have helped companies like Amazon and Netflix to increase their revenue to as much as 10% to 25%.
And hence the need of the hour is to optimize their performance.
So, what are recommenders? Recommenders are the applications which personalize your customer’s shopping experience by recommending next best options in light of their recent buying or browsing activity. Recent developments in analytics and machine learning have let to many state of the art recommender systems.
Types of Recommenders: There are broadly five types of recommender systems, which are as follow:
1. Most Popular Item
2. Association and Market Basket Models
3. Content Filtering
4. Collaborative Filtering
5. Hybrid Models

In coming years, recommender system will be used by almost every organisation, whether it's big or small, and will become an inseparable part of the ecommerce world.


To know more read the article by William Vorhies at: http://www.datasciencecentral.com/profiles/blogs/understanding-and-selecting-recommenders-1

 

 

  3383 Hits

2016: The year of Deep Learning

 2016 has been the year of deep learning, some big breakthrough were achieved in 2016 by Google and DeepMind.Some of the most significant achievements are as follow :

 AlphaGo triumphs Go showdown : AlphaGo the google’s AI for the game Go to everyone’s surprise was able to beat Go champion Lee Sedol.

 Bots kicking our butts in StarCraft : DeepMind AI bots were able to outperform some of the top rated StarCraft II players.

 DIY deep learning for Tic Tac Toe : AlphaToe a AI bot was able to outperform most of the people that played with it.

 Google’s Multilingual Neural Machine Translation : Google was able to make a model which is capable of translating text b/w languages, reaching a new milestone in linguistics and NLP.

Continue reading
  3631 Hits

A Guide to Choosing Machine Learning Algorithms

Machine Learning is the backbone of today’s insights on customer, products, costs and revenues which learns from the data provided to its algorithms. And hence algorithms are the next most important thing in data science after data.
Hence , the question which algorithm to use ? Some of the most used algorithms and their use cases are as follow :

1) Decision Trees - It’s output is easy to understand and can be used for Investment decision ,Customer churn ,Banks loan defaulters,etc.

2) Logistic Regression - It’s a powerful way of modeling a binomial outcome with one or more explanatory variables and can be used for Predicting the Customer Churn, Credit Scoring & Fraud Detection, Measuring the effectiveness of marketing campaigns, etc. ,

3) Support Vector Machines - It’s a supervised machine learning technique that is widely used in pattern recognition and classification problems and can be used for detecting persons with common diseases such as diabetes, hand-written character recognition, text categorization, etc. ,

4)Random Forest: It’s an ensemble of decision trees and can solve both regression and classification problems with large data sets and used in applications such as Predict patients for high risks, Predict parts failures in manufacturing, Predict loan defaulters, etc.

Continue reading
  3710 Hits

Winning Data Strategy using Industrialized Machine Learning

 The first block to build a winning business strategy is to create a map based on business value of the question and approximating how much time would it take to get high quality answers to that question. The idea is to break the business questions into groups that corresponds to real time data systems. It allows you to focus on a specific system at once to build a strong strategy and optimize the sequence in which each sub question needs to be answered depending upon its current business value. A pattern of actions for data strategy begins with a hypothesis and collection of relevant data followed by building models to explain the data and evaluating its credibility for future predictions. The entire process is achieved on an enterprise scale digital infrastructure using Industrialized Machine Learning (IML). This approach can have a huge impact on natural resources and healthcare industries as well.

Read more at : https://blogs.csc.com/2016/07/05/how-to-build-and-execute-a-real-data-strategy/

 

  4360 Hits

Automatic Debt Management System 

Big Data Analytics and Business Intelligence is changing the way business interacts with customers. Modern big data solutions have enabled automated decision making in debt management systems for client handling processes. Correct implementation of these tools provides a more personalized experience to each customer and avoid infringements. Debt management automation has been proven a successful solution to maintain balance between meticulous efficiency and customer satisfaction. Such a CRM automates a lot of process and thus it requires a small team days to complete debt collection process. Analytics have not just accelerated debt collection, but also enhanced customer relations.

You can read more at: http://www.dataminingblog.com/what-could-big-data-mean-for-debt-management/

 

 

  3559 Hits

Sigma Connect

sigmaway forums

Forum

Raise a question

Access Now

sigmaway blogs

Blogs

Blog on cutting edge topics

Read More

sigmaway events

Events

Hangout with us

Learn More

sigmaway newsletter

Newsletter

Start your subscription

Signup Now