Data Platform

UiPath & Azure: Data Driven Efficiency

By | Automation Initiation, Data Platform, Enterprise PowerBI | No Comments

At Talos our goal is simple: to deliver data-driven efficiency. In our experience, the best way to achieve this technically has been through the combination of UiPath & Azure – two very powerful technology stacks that when paired properly will deliver immense value. This blog post examines the relationship between UiPath & Azure, a typical design & use-case, and the data driven benefits of leveraging both together.



UiPath & Azure

UiPath is the world’s leading RPA company delivering a suite of automation technologies targeted at the enterprise level. Azure is Microsoft’s cloud platform, offering a vast array of services to customers to meet their complex needs. Given the technical strengths of each, it is possible to craft a solution that utilizes both to achieve optimum benefits, and in our case, deliver our goal of data-driven efficiency.

In numerous projects we have worked on, there has been a typical requirement to deliver reporting based on the combination of data from a variety of sources. This simple use-case can actually be quite complex, especially when dealing with legacy data source systems, large volumes of data and a significant reporting consumer-base. However, these requirements can be met with a very straight-forward UiPath & Azure solution design:

UiPath & Azure

This design is composed of the following individual elements:

  1. Azure VM’s
    1a. UiPath Studio
    1b. UiPath Unattended Bot
  2. Azure Blob Storage
  3. Azure Data Factory
  4. Azure DevOps
  5. Azure SQL Database
  6. Power BI

Each of these elements have a specific role in the solution, and fit in easily with each other, making this solution extremely robust, easy to initiate and  scalable.

  1. Azure VM’s

Azure Virtual Machines (VM) are the image service instances that provide computing resources that behave exactly like physical infrastructure, except virtually. The benefits of Azure VM are the on-demand nature of the product and the lower costs associated with management. In our solution, 2 of these VM’s are commissioned to provide separate environments for the UiPath tool:

1a. UiPath Studio is installed on the first VM (this is used as the dev/test environment). UiPath Studio is the development tool used to create and test automations. In our case, we use UiPath Studio to develop the automation steps required to retrieve data from the different legacy systems.

1b. UiPath Unattended Bot is installed on the second VM (this is used as the production environment). Once the automations are developed and tested, they are deployed to be executed by the Unattended Bot on this VM. By having a dedicated VM, this bot acts like a 24/7 worker and is available on the VM to perform any tasks it is instructed to do.

  1. Azure Blob Storage

Azure Blob storage is the scalable and secure object storage service used to store data in a variety of formats. Blob storage is very robust and can scale very easily to suit storage needs. In our case, we leverage Blob storage as a staging area for the bot to store the data it has retrieved, and also serves as a source for the Azure Data Factory pipeline later on. UiPath can communicate with Azure Blob storage natively, making this integration easy and reliable.

  1. Azure Data Factory

Azure Data Factory (ADF) is the data integration service built for complex ETL projects. The benefits of ADF are the ease-of-development as well as the powerful capabilities that allow it to handle complicated data transformations in large volumes. In our case, we use ADF to create a pipeline to move data from Blob Storage into the Azure SQL database, whilst performing sophisticated transformations. Although UiPath possesses some ETL functionality, they lack the advanced capabilities of ADF regarding transformation and speed. For this reason, ADF is a must have in our solution.

  1. Azure DevOps

Azure DevOps (ADO) is the collaboration tool for software development offering work tracking, source control and continuous integration/delivery. ADO allows projects to be managed effectively and collaboratively. In our case, ADO acts as the project management tool and code repository for the ADF pipelines. Although UiPath has a native integration with ADO, it also possesses the source control and deployment capabilities out-of-the-box, and therefore can be managed within the tool itself.

  1. Azure SQL Database

Azure SQL Database (DB) is the cloud-based database service built on the SQL Server engine. It has a wide range of deployment options, making it very easy and effective to manage. In our case, this is where the transformed data is stored and made available for analysis. The scalability, availability and deployment ease of Azure SQL DB make this very attractive for enterprise-level analytics and reporting.

  1. Power BI

Finally, whilst not an Azure service itself, Power BI is used as the reporting tool to develop custom reporting on the data in the Azure SQL DB. Power BI is an excellent reporting platform for enterprise-level reporting due to its scalability and self-service capabilities. With this tool in place, the data is modelled and reported to the business.

Although every project is different, the above represents a good design for a typical solution and is used by us as a list of ‘minimums’ that we need. For our work, this design is very common, but is also great as a default because it can be restructured very easily to suit any additional needs (for example adding Azure Machine Learning can be integrated into the above design as part of a CASSIE deployment). With the above solution, a business can very quickly leverage the benefits of UiPath & Azure working in unison and delivering data-driven efficiencies: businesses are able to get the insights and reporting they need without having to expend any staff time to deliver it.

If you want to know more about UiPath & Azure together, please contact us.


Profisee partner with Talos

By | Data Platform | No Comments

Profisee + Talos – Partnership Announcement

Profisee have formally partnered with Talos to help deliver MDM (Master Data Management) solutions to Australian business and enhance your Modern Data Platform.

Profisee’s Master Data Management software makes it easy and affordable for companies of all sizes to build a trusted foundation of data across the enterprise. Talos’ expertise ensures a successful implementation that delivers value to the business.

Why is an MDM solution important?

The key benefit of an MDM solution is to improve the quality of your data when it is held in multiple systems. This can quickly yield benefits by simple reduction of duplication of effort – for example reducing the number of mail recipients on a corporate Christmas card list!
An MDM solution is agnostic to the source systems but can integrate back to them to keep data clean. In plain English this means that you can independently manage the data that resides in multiple systems, but allow the MDM solution send cleaned data back to them.

A good tool supports ongoing stewardship and governance through workflows, monitoring and corrective-action techniques that can all be managed by an end user.

Why Profisee?

Profisee is ideal for companies looking to get started with MDM and seeking an easy-to-use, cost-effective and rapidly deployed MDM solution at an affordable price point. The price issue is significant and many enterprise MDM tools come attached with significant price tags and services overhead which may not suit many businesses.
According to Gartner, customers report Profisee offers an economically priced offering and favorable TCO. Combined with shorter implementation time frames, where Profisee has more implementations taking under three months than any other vendor in the 2020 Magic Quadrant, this makes it an ideal tool for initial MDM implementations.
Contact us to speak with one of Talos’s experts.
Contact us today

    Data Quality: Enter the 4th Dimension

    By | Data Platform | No Comments

    Data quality is a uniform cause of deep pain in establishing a trusted data platform in Data & AI projects. The more systems that are involved the harder it gets to clear it up, before you even start accounting for how old they are, how up to speed the SME’s are, how poor front end validation was – there’s a host of potential problems. However something tells me that the number of projects where the customer has said that it’s OK if the numbers are wrong is going to remain pretty small.

    Scope, Cost, Time – Choose one. But not that one.

    Project Management Triangle

    Data Quality is a project constraint

    Many of you will be familiar with the Project Management Triangle which dictates that you vary two of Scope, Cost or Time to fix the other. The end result being that in the middle, Quality gets affected. For most Data & AI projects I have found cost and time tend to be least negotiable, so scope gets restricted. Yet, somehow Time and Cost get blown out anyway.

    Whilst Data & AI is hardly unique in terms of cost and schedule overruns, there is one key driver which is neglected by traditional methods. Leaning once again on Larissa Moss’s Extreme Scoping approach, she calls out the reason. It’s because in a Data & AI project, Quality – specifically Data Quality – is also fixed. The data must be complete and the data must be accurate for it to be usable – and there is no room for negotiation on this. Given that the data effort consumes around 80% of a Data & AI projects budget, this becomes a significant concern.

    How do we manage Data Quality as a constraint?

    We have to get the business to accept that the traditional levers can’t be pulled in the way they are used to and that requires end user education. The business needs to be made aware that it is a fixed constraint – one that they are imposing, albeit implicitly. The business has to accept that if Quality is not a variable, then the three traditional “pick two to play with” becomes “prepare to vary all of them”.  Larissa Moss refers to this as an  “Information Age Mental Model” which prioritises quality of output above all else.

    Here is where strong leadership and clear communication comes into play. Ultimately if one business demands a certain piece of information the Data & AI project team will have to be clear to them that to obtain that piece of data to the quality which is mandated, they must be prepared to bear the costs of doing so, including the cost of bringing it up to a standard that means it is enterprise grade and reusable, so that it integrates with the whole solution for both past and future components of the system. This of course does not mean that an infinite budget is opened up to deal with each data item. Some data may not be worth the cost of acquisition. What it does mean is that the discussion about the costs can be more honest, and the consumer can be more aware of the drivers for the issues that will arise from trying to obtain their data.

    ELT Framework in Microsoft Azure

    Azure ELT Framework

    By | Data Platform | No Comments

    The framework shown above is becoming a common pattern for Extract, Load & Transform (ELT) solutions in Microsoft Azure. They key services used in this framework are Azure Data Factory v2 for orchestration, Azure Data Lake Gen2 for storage and Azure Databricks for data transformation. Here are the key benefits each component offers –

    1. Azure Data Factory v2 (ADF) – ADF v2 plays the role of an orchestrator, facilitating data ingestion & movement, while letting other services transform the data. This lets a service like Azure Databricks which is highly proficient at data manipulation own the transformation process while keeping the orchestration process independent. This also makes it easier to swap transformation-specific services in & out depending on requirements.
    2. Azure Data Lake Gen2 (ADLS) – ADLS Gen2 provides a highly-scalable and cost-effective storage platform. Built on blob storage, ADLS offers storage suitable for big data analytics while keeping costs low. ADLS also offers granular controls for enforcing security rules.
    3. Azure Databricks – Databricks is quickly becoming the de facto platform for data engineering & data science in Azure. Leveraging Apache Spark’s capabilities through Dataframe & Dataset APIs and Spark SQL for data interrogation, Spark Streaming for streaming analytics, Spark MLlib for machine learning & GraphX for graph processing, Databricks is truly living up to the promise of a Unified Analytics Platform.

    The pattern makes use of Azure Data Lake Gen2 as the final landing layer, however it can be extended with different serving layers such as Azure SQL Data Warehouse if an MPP platform is needed, Azure Cosmos DB if a high-throughput NoSQL database is needed, etc.

    ADF, ADLS & Azure Databricks form the core set of services in this modern ELT framework. Investment in their individual capabilities and their integration with the rest of the Azure ecosystem continues to be made. Some examples of new upcoming features include Mapping Data Flows in ADF (currently in private preview) which will let users develop ETL & ELT pipelines using a GUI-based approach and MLflow in Azure Databricks (currently in public preview) which will provide capabilities for machine-learning experiment tracking, model management & operationalisation. This makes the ELT framework sustainable and future-proof for your data platform.

    SSIS Integration Runtime Connectivity Testing

    By | Data Platform | No Comments

    SSIS Integration Runtime Connectivity Testing is hard as there is no physical Azure VM to log in to as part of the Azure Data Factory (ADF). While behind the scenes there is effectively a VM spun up there is no way to access it.

    The scenario our Data Platform team faced was reasonably simple – we needed to connect to a 4D database that sat behind the Storman application that our customer used so that we could extract data for their various workloads. Because 4D is not supported by the Generic ODBC source in Azure Data Factory, we needed to use the 4D ODBC driver. This meant using SSIS to leverage the driver.

    The client is well managed in terms of security so the target system can only be accessed within their network. Their Azure network was connected to theirs and properly secured, so part of the setup of the SSIS Integration Runtime in Azure Data Factory is to ensure that it is joined to the virtual network.

    Houston, we have a problem

    SSIS Azure Data Factory

    SSIS Azure Data Factory

    However, despite all this – we couldn’t get the ODBC connection to work when deployed. Due to stability issues our first suspect was the driver – after all it frequently crashed Visual Studio 2017 / SSDT and configuration was a pain. Also, initially we couldn’t connect on our dev machines as we weren’t on the clients VPN (easily fixed, fortunately). Then we had the wrong target server (again easily fixed).

    Once we got on to ADF of course our debugging options got more limited as we now were having to do SSIS Integration Runtime Connectivity Testing without all the tools available on our desktops . Initially we struggled because the runtime was very slow at sharing its metadata (package paths, connection managers, etc.) so we weren’t initially sure it was even able to work with the driver. Eventually we got enough metadata to start playing with the JSON of the task to configure it. However we continued to get errors in ADF that were’t really helping.

    Our breakthrough came when we remembered we could just connect to the more familiar and mature environment of the SSIS catalog that is deployed alongside the runtime. We configured the package correctly, ran it and got a more manageable ODBC error – “Cannot reach Destination server”. A quick ping from our desktops proved the server could be pinged, so as a test we used a simple package with just a script task to ping the server. This worked just fine on our desktop, but when deployed the script task reported failure.

    So a quick connectivity test helped pin it down to probable network config issue. Now it’s in the Infrastructure teams hands to ensure everything is configured correctly, but at least we have (for now at least) got SSIS & the ODBC driver off the list of probable causes of the issue. It’s also taught us a few things about SSIS Integration Runtime Connectivity Testing.