Today at the Build conference, Microsoft announced Microsoft Fabric, a unified analytics solution. What is it? It is a real game-changer, and in today’s article, I would like to share with you a general idea that comes with Fabric and how it can help you with your growing analytical needs. Let’s start!
To begin, please take a look at the official graphic that provides a quick high-level overview of Fabric:
Before I delve into Fabric itself, let’s start with a brief introduction where I will try to describe a few problems that this new platform aims to address. To begin with, all of us who work with data can attest to the fact that there are a few core challenges when it comes to planning and implementing a new analytical platform or solution.
Centrally Governed Analytical Solutions vs. Self-Service Solutions
Centrally governed analytical solutions and self-service solutions are two distinct approaches within the realm of data analytics. Centrally governed solutions involve a centralized team responsible for overseeing data analytics initiatives, placing emphasis on stringent data governance, security, and compliance measures. These solutions prioritize data consistency, standardization, and the establishment of a unified information source.
On the other hand, self-service solutions empower individual users to independently perform data analysis, providing them with user-friendly tools and interfaces. These solutions offer agility, flexibility, and prompt generation of insights. However, if not effectively managed, they can present challenges in terms of data quality, consistency, and governance.
Achieving the optimal balance between these approaches is essential, as organizations must find a middle ground that aligns with their specific needs, available resources, and risk tolerance. A hybrid approach, which combines elements from both methodologies, often proves to be the most effective solution. This approach grants user empowerment while ensuring data accuracy, security, and compliance.
Thoroughly assessing organizational objectives, resources, and data culture enables businesses to determine the most suitable path for unlocking the full potential of data-driven decision-making.
Data siloes
Data silos are isolated storage and management of data within different departments, teams, or systems within an organization. This situation arises when data is not easily accessible or shared across various departments or functions. Data silos pose significant challenges as they impede data integration, hinder collaboration, and limit the organization’s ability to derive valuable insights from the data. They can lead to duplicated efforts, inconsistent data, and an incomplete understanding of the overall picture.
Breaking down data silos is crucial for organizations to foster a data-driven culture, enable cross-functional analysis, and make well-informed decisions based on a holistic view of the data.
Limited scalability
One of the major challenges associated with limited scalability in analytical solutions is the inability to efficiently handle large volumes of data. As datasets continue to grow in size, traditional analytical tools and algorithms struggle to process and analyze them within a reasonable timeframe. This limitation hampers the ability to gain timely insights and make data-driven decisions.
Furthermore, limited scalability often results in performance bottlenecks and increased processing costs. Organizations may be required to invest in expensive hardware or infrastructure upgrades to accommodate the growing data requirements. Ultimately, the lack of scalability in analytical solutions can impede progress and hinder the full utilization of data resources.
Different data oriented roles
Roles such as data analysts, data engineers, and data scientists have distinct needs that may not be fully addressed by a single solution. Data analysts typically require user-friendly interfaces and visualization tools to effectively explore and present data. Data engineers focus on data integration and infrastructure requirements, requiring solutions capable of handling large-scale data processing and storage. Data scientists need advanced algorithms and modeling capabilities to extract insights and build predictive models. Consequently, a one-size-fits-all approach often fails to meet the specific needs of each role, hindering collaboration and efficiency within the data-driven ecosystem.
While the aforementioned problems are not the only ones, in my opinion, they are the most significant. However, I have good news for you – Microsoft Fabric addresses all of these problems. How? Let’s delve into the details!
To begin, Fabric brings together existing services like Data Factory, Synapse, and Power BI into one unified product for all your data and analytics workloads. This means that all these services are blended together into one platform.
The picture below illustrates the seven Microsoft Fabric workloads built on top of a common storage foundation called OneLake:
From my perspective, the most important aspect is the fact that the entire platform is based on a data lake, making it fully lake-centric. This means that the platform itself strongly emphasizes the utilization of data lakes as a central component of its architecture and approach. Fabric’s lake-centric nature implies that it leverages and integrates data from data lakes as the primary source for analysis, processing, and deriving insights. This is crucial because most of the operations performed on Fabric will be somehow connected to the data lake itself.
With its native integration, Fabric can leverage Purview’s data cataloging capabilities to provide a comprehensive inventory of data assets, including their descriptions, metadata, and relationships. This empowers users to easily discover and understand the available data assets, enhancing collaboration and data exploration across different roles. Additionally, Purview’s data lineage capabilities enable Fabric to offer transparency and traceability of data, showcasing the origin, transformations, and dependencies of data assets. This helps ensure data quality, compliance, and supports data-driven decision-making.
Sounds good? Indeed!
So what scenarios can we address with this new platform?
A lakehouse architecture combines the benefits of a data lake and a data warehouse, offering a unified platform for data storage and processing. It allows organizations to store raw, unprocessed data in a data lake while also enabling the application of schemas and structured queries similar to those used in traditional data warehouses. The lakehouse architecture has become the most popular approach for big data workloads. The image below illustrates the typical flow connected to the lakehouse in Fabric:
So, we have structured or unstructured data from source systems that can be transformed and ingested using tools like Pipelines & Dataflows, which we are familiar with from Data Factory and Power BI. This data can then be stored in a new object called Lakehouse. Once in the Lakehouse, it can undergo further transformations using Spark notebooks and dataflows. Ultimately, it can end up in a Power BI dataset or a data warehouse. Does this sound familiar? Yes, but the great thing is that all of these tools are covered under one service, so you don’t need to worry about setting up all the underlying infrastructure – it’s just a Software-as-a-Service (SaaS) like offering! An interesting feature that comes with Fabric is Shortcuts. This functionality allows you to create shortcuts to other places with data without actually copying it. Pretty cool!
Now, let’s touch upon the data warehouse. Although it’s a well-known concept, I’ll provide a brief description. A data warehouse is still a very common pattern for storing data. It serves as a central repository that consolidates data from various sources. Its primary purpose is to support reporting and business intelligence activities.
The approach taken by Fabric in this area can be seen below:
As you can see from an architectural point of view, it appears similar to the previous example. However, in this case, I would like to emphasize the procedures used for data transformation. These are well-known TSQL procedures that you are familiar with and appreciate. This is one of the main key points here – you don’t need to learn new skills; you can reuse your existing TSQL skills! Personally, I find it highly beneficial to have the option to use TSQL. Based on my experience with various customers, I have observed that the skillset within an organization is one of the most important factors when choosing an analytical platform.
Moving on to the next scenario, we have data science, which combines scientific methods, algorithms, and tools to extract knowledge and insights from data. It involves the application of statistical analysis, machine learning, and computational techniques to understand patterns, make predictions, and solve complex problems. Data scientists utilize programming languages like Python or R, along with data manipulation and visualization tools, to collect, clean, and analyze data.
The last scenario I would like to mention is Real-Time Analytics, which refers to the process of analyzing data and generating insights immediately as data is generated or received, without any significant delay. It involves the continuous monitoring and analysis of streaming or rapidly changing data to gain up-to-the-minute insights and make timely decisions.
Streaming can be supported in many different scenarios within Fabric. Take a look at Kusto DB! You may be familiar with it from Logs Analytics or Data Explorer, as it has been available for quite some time. This technology offers excellent scalability and integration capabilities, allowing it to process streaming data and provide it to consumers in an easy and efficient manner.
Now that we know the scenarios we can address and the capabilities that Fabric offers, let’s take a look at how it looks. When we open the familiar Power BI portal, we will notice the Fabric homepage:
From my previous description, you should recognize all of these symbols, but just to be clear:
- Power BI: This includes all well-known artifacts related to Power BI, such as reports and datasets.
- Data Factory: This encompasses integration objects like pipelines and Power Query.
- Synapse Data Engineering: Here, you’ll find notebooks and Apache Spark content.
- Synapse Data Science: This section is dedicated to machine learning components.
- Synapse Data Warehouse: It contains warehouse objects and artifacts.
- Synapse Real-time Analytics: This includes Data Explorer and EventStream services,
- Data Activator – new service that will come soon.
The Power BI service provides the option to switch views, displaying a slightly different interface depending on the work you need to perform. In this article, I will share a few screenshots from the portal to give you an idea of the new Fabric experience. Below, you can see the OneLake view with all the artifacts:
As you can see, this view provides a general overview of various artifacts that have been created. Take a look at the “Type” column, where you will notice several new object types, including KQL Database, Lakehouse, and Warehouse. How cool is that?
As someone who is a huge fan of SQL technologies, I would like to highlight the data warehousing aspect. Imagine being able to write TSQL queries directly on the portal, even on a lake-centric database. Well, guess what? You will have the opportunity to do just that and so much more. If you’re a fan of Synapse SQL Pools, this is the place you should explore first.
In today’s world, SQL alone is not always sufficient. That’s why we also have the Spark engine, which allows us to define data engineering and data science tasks. And the best part? We can leverage the familiar notebook experience we had in Synapse Analytics. However, this time, it’s even easier to work with because it is fully integrated with other services within Fabric. This seamless integration enhances collaboration and enables a more streamlined and efficient workflow across different analytical tasks.
The last few screenshots came from the Data Factory section, where we can utilize Power Query and standard Data Factory pipelines.
Look carefully at the task bar and you will notice some new additional features😊
Awesome! Personally, the crucial thing for me is that all of the presented features are already integrated because Fabric is a pure SaaS-like service. What a relief after spending hours configuring networking, connectivity, and other infrastructure-related tasks. I’m sure you have many questions about the new platform, such as details of each service within Fabric, pricing, monitoring, deployments, and more. I encourage you to carefully review the official communications from the product team because there’s more to come, and I’m confident that you won’t be disappointed. I’m genuinely excited, and I believe you are too!
If you are interested in to the topic please follow the links:
- Arun Ulagaratchagan announcement about Fabric: https://aka.ms/build2023-fabricblog
- Digital Event (Build video on-demand about Fabric): https://aka.ms/build-with-analytics
- Trial: https://aka.ms/try-fabric
- Fabric community: https://aka.ms/fabric-community
- Avoiding Issues: Monitoring Query Pushdowns in Databricks Federated Queries - October 27, 2024
- Microsoft Fabric: Using Workspace Identity for Authentication - September 25, 2024
- Executing SQL queries from Azure DevOps using Service Connection credentials - August 28, 2024
Last comments