All Posts By

James Beresford

Azure ML PowerBI

By | AI & ML, Data Visualisation | No Comments

Leveraging Azure ML Service Models with Microsoft PowerBI

Machine Learning (ML) is shaping and simplifying the way we live, work, travel and communicate. With the Azure Machine Learning (Azure ML) Service, data scientists can easily build and train highly accurate machine learning and deep-learning models.  Now PowerBI makes it simple to incorporate the insights from models build by data scientists on Azure Machine Learning service and their predictions in the PowerBI reports by using simple point and click gestures. This will enable business users with better insights and predictions about their business.

This capability can be leveraged by any PowerBI user (with an access privilege granted through the Azure portal).  Power Query automatically detects all ML Models that the user has access to and exposes them as dynamic Power Query functions.

This functionality is supported for PowerBI dataflows, and for Power Query online in the PowerBI service.

Schema discovery for Machine Learning Service models

Unlike the Machine Learning studio (which helps automate the task of creating a schema file for the model), in Azure Machine Learning Service Data scientists primarily use Python to build and train machine learning models.

Invoking the Azure ML model in PowerBI

  1. Grant access to the Azure ML model to a Power BI user: To access an Azure ML model from PowerBI, the user must have Read access to the Azure subscription. In addition:
  • For Machine Learning Studio models, Read access to Machine Learning Studio web service
  • For Machine Learning Service models, Read access to the Machine Learning service workspace
  1. From the PowerQuery Editor in your dataflow, select the Edit button for the dataset that you want to get insights about, as shown in the following image:
Azure ML PowerBI Edit Dataset

Azure ML PowerBI Edit Dataset

 

  1. Selecting the Edit button opens the PowerQuery Editor for the entities in your dataflow:
Azure ML PowerBI PowerQuery

Azure ML PowerBI PowerQuery

 

  1. Click on AI Insights button (on the top ribbon), and then select the “Azure Machine Learning Models” folder from the left navigation menu. All the Azure ML models appear as PowerQuery functions. Also, the input parameters for the Azure ML model are automatically mapped as parameters of the corresponding PowerQuery function.
Azure ML PowerBI AI Insights

Azure ML PowerBI AI Insights

  1. To invoke an Azure ML model, we can specify the column of our choice as an input.

 

  1. To examine/preview the model’s output, select Invoke. This will show us the model’s output column, and this step also appears (model invocation) as an applied step for the query.
Azure ML PowerBI Invoke

Azure ML PowerBI Invoke

Summary

With this approach we can integrate all ML models (built using either Azure ML service or studio) with PowerBI reporting. This enables business to effectively utilise the models built by data scientists by any user (typically BI analyst) for relevant datasets based on the problem we are trying to solve (either classification/regression) or to get predictions. Utilising all these new enhancements of Microsoft PowerBI will enlighten business users with better insights and this in turn aids in better decision making.

Let our Data Visualisation and Machine Learning experts help you explore the potential – contact us today!

Data & AI Strategy metrics

By | Data & AI | No Comments

Why are Data & AI strategy metrics important? The beauty of “strategies” for some is that a strategy – unlike a tactic – often doesn’t come with any clear success / fail KPI’s. It allows a lot of wriggle room for ambiguous assessments of whether it worked or not. However any self-respecting Data & AI strategy should not allow this. After all, it is designed and executed in the name of improving the use of data and measurable outcomes within an organisation. A good Data & AI strategy should have measures to determine its success.

Data & AI Strategy metrics that matter

Commonly raised metrics are based around uptake and usage (software vendors are particularly fond of these). This seems based on the hope that the apparent usage of tools is inherently a good thing for a company that will somehow lead to – I don’t know – increased synergy?

Dilbert Utilising Synergy

Dilbert Utilising Synergy

Sometimes they are measured around data coverage by the EDW or project completion.  However, if I was to put my CEO hat on, I would want to know the answer to the question “how are all these Data & AI users improving my bottom line?”. After all, if the Data & AI tools are being heavily used, but only to manage the footy tipping competition, then I’m not seeing a great deal of ROI.

The metrics that matter are the Corporate metrics.

A good Data & AI Strategy should be implemented with a core goal of supporting the Corporate strategy, which will have some quantifiable metrics to align to. If not, a good Data & AI strategy isn’t going to help you much as your organisation has other problems to solve first!

In a simple case, imagine a key part of the strategy is to expand into a new region. The Data & AI strategy needs to support that by providing data & tools that supports that goal, enabling the team in the new region to expand – and should be measured against its ability to support the success of the Corporate strategy.

This is why at FTS Data & AI, our first step in defining a Data & AI Strategy for an organisation is to understand the Corporate strategy – and its associated metrics – so we can align your Data & AI strategy to it and create a business case to justify why you need to embark on a Data & AI strategy in the first place. The metrics are the foundation that prove that there is deliverable value to the business. This is why the Corporate Strategy sits at the top of our Strategy Framework:

Data & AI Strategy Framework

Data & AI Strategy Framework

We have extensive experience designing strategies that support your business. Contact us today to speak with one of our experts.

Data Quality: Enter the 4th Dimension

By | Data Platform | No Comments

Data quality is a uniform cause of deep pain in establishing a trusted data platform in Data & AI projects. The more systems that are involved the harder it gets to clear it up, before you even start accounting for how old they are, how up to speed the SME’s are, how poor front end validation was – there’s a host of potential problems. However something tells me that the number of projects where the customer has said that it’s OK if the numbers are wrong is going to remain pretty small.

Scope, Cost, Time – Choose one. But not that one.

Project Management Triangle

Data Quality is a project constraint

Many of you will be familiar with the Project Management Triangle which dictates that you vary two of Scope, Cost or Time to fix the other. The end result being that in the middle, Quality gets affected. For most Data & AI projects I have found cost and time tend to be least negotiable, so scope gets restricted. Yet, somehow Time and Cost get blown out anyway.

Whilst Data & AI is hardly unique in terms of cost and schedule overruns, there is one key driver which is neglected by traditional methods. Leaning once again on Larissa Moss’s Extreme Scoping approach, she calls out the reason. It’s because in a Data & AI project, Quality – specifically Data Quality – is also fixed. The data must be complete and the data must be accurate for it to be usable – and there is no room for negotiation on this. Given that the data effort consumes around 80% of a Data & AI projects budget, this becomes a significant concern.

How do we manage Data Quality as a constraint?

We have to get the business to accept that the traditional levers can’t be pulled in the way they are used to and that requires end user education. The business needs to be made aware that it is a fixed constraint – one that they are imposing, albeit implicitly. The business has to accept that if Quality is not a variable, then the three traditional “pick two to play with” becomes “prepare to vary all of them”.  Larissa Moss refers to this as an  “Information Age Mental Model” which prioritises quality of output above all else.

Here is where strong leadership and clear communication comes into play. Ultimately if one business demands a certain piece of information the Data & AI project team will have to be clear to them that to obtain that piece of data to the quality which is mandated, they must be prepared to bear the costs of doing so, including the cost of bringing it up to a standard that means it is enterprise grade and reusable, so that it integrates with the whole solution for both past and future components of the system. This of course does not mean that an infinite budget is opened up to deal with each data item. Some data may not be worth the cost of acquisition. What it does mean is that the discussion about the costs can be more honest, and the consumer can be more aware of the drivers for the issues that will arise from trying to obtain their data.

ELT Framework in Microsoft Azure

Azure ELT Framework

By | Data Platform | No Comments

The framework shown above is becoming a common pattern for Extract, Load & Transform (ELT) solutions in Microsoft Azure. They key services used in this framework are Azure Data Factory v2 for orchestration, Azure Data Lake Gen2 for storage and Azure Databricks for data transformation. Here are the key benefits each component offers –

  1. Azure Data Factory v2 (ADF) – ADF v2 plays the role of an orchestrator, facilitating data ingestion & movement, while letting other services transform the data. This lets a service like Azure Databricks which is highly proficient at data manipulation own the transformation process while keeping the orchestration process independent. This also makes it easier to swap transformation-specific services in & out depending on requirements.
  2. Azure Data Lake Gen2 (ADLS) – ADLS Gen2 provides a highly-scalable and cost-effective storage platform. Built on blob storage, ADLS offers storage suitable for big data analytics while keeping costs low. ADLS also offers granular controls for enforcing security rules.
  3. Azure Databricks – Databricks is quickly becoming the de facto platform for data engineering & data science in Azure. Leveraging Apache Spark’s capabilities through Dataframe & Dataset APIs and Spark SQL for data interrogation, Spark Streaming for streaming analytics, Spark MLlib for machine learning & GraphX for graph processing, Databricks is truly living up to the promise of a Unified Analytics Platform.

The pattern makes use of Azure Data Lake Gen2 as the final landing layer, however it can be extended with different serving layers such as Azure SQL Data Warehouse if an MPP platform is needed, Azure Cosmos DB if a high-throughput NoSQL database is needed, etc.

ADF, ADLS & Azure Databricks form the core set of services in this modern ELT framework. Investment in their individual capabilities and their integration with the rest of the Azure ecosystem continues to be made. Some examples of new upcoming features include Mapping Data Flows in ADF (currently in private preview) which will let users develop ETL & ELT pipelines using a GUI-based approach and MLflow in Azure Databricks (currently in public preview) which will provide capabilities for machine-learning experiment tracking, model management & operationalisation. This makes the ELT framework sustainable and future-proof for your data platform.

Agile Zero Sprint for Data & AI projects

By | Data & AI | No Comments

Agile methodologies have a patchy track record in Data & AI projects. A lot of this is to do with adopting the methodologies themselves – there are a heap of obstacles in the way that are cultural, process and ability based. I was discussing agile adoption with a client who readily admitted that their last attempt had failed completely. The conversation turned to the concept of the Agile Zero Sprint and he admitted part of the reasons for failure was that they had allowed Zero time for their Agile Zero Sprint.

What is an Agile Zero Sprint?

The reality of any technical project is that there are always certain fundamental decisions and planning processes that need to be gone through before any meaningful work can be done. Data Warehouses are particularly vulnerable to this – you need servers, an agreed design approach, a set of ETL standards – before any valuable work can be done – or at least without incurring so much technical debt that your project gets sunk after the first iteration cleaning up after itself.

So the Agile Zero Sprint is all that groundwork that needs to be done before you get started. It feels “un”-agile as you can easily spend a couple of months producing nothing of any apparent direct value to the business/customer. The business will of course wonder where the productivity nirvana is – and particularly galling is you need your brightest and best on it to make sure you get a solid foundation put in place so it’s not a particularly cheap phase either. You can take a purist view on the content from the Scrum Alliance or a more pragmatic one from Larissa Moss.

How to structure and sell the Zero sprint

The structure part is actually pretty easy. There’s a set of things you need to establish which will form a fairly stable product backlog. Working out how long they will take isn’t that hard either as experienced team members will be able to tell you how long it takes to do pieces like the conceptual architecture. It just needs to be run like a long sprint.

An Agile Zero Sprint prevents clogged pipes

An Agile Zero Sprint prevents clogged pipes

Selling it as part of an Agile project is a bit harder. We try and make this part of the project structure part of the roadmap we lay out in our Data & AI strategy. Because you end up not delivering any business consumable value you need to be very clear about what you will deliver, when you will deliver it and what value it adds to the project. It starts smelling a lot like Waterfall at this point, so if the business is skeptical that anything has changed, you have to manage their expectations well. Be clear that once the initial hump is passed, the value will flow – but if you don’t do it the value will flow earlier to their expectations, but then quickly after the pipes will clog with technical debt (though you may want to use a different terminology!)