Automating Google Spreadsheets Data to ADLS Gen2 Using ADF & Logic Apps
The record consists of practical implementation for the integration between the Azure and Google Spreadsheets Data. The main objective of this effort is to pull the data from the Google Spreadsheet and stage it in the ADLS Gen2 Storage so that it can be populated to other resources and can be consumed in the desired format based on the usage.
As there is no direct connector present in the Azure Data Factory for the Google Spreadsheets to perform the required ETL processes. Hence, we are using both ADF and Logic Apps for making the data consumption form google sheets possible.
NOTE : This Solution is completely Dynamic and can be used for the multiple Spreadsheet which even consist multiple sheets inside it.
Proposed Solution :
Azure Logic Apps is a cloud-based platform for creating and running automated workflows that integrate apps, data, services, and systems. In our case we are using the Logic apps google sheets connector for retrieving the Data and then Logic Apps ADLS Gen2 connector for dumping the data, that’s how the procedure goes on.
The Dynamic part (i.e. multiple Spreadsheet which even consist multiple sheets inside it) was covered in the ADF pipelines. The pipelines dynamically pass the different Spreadsheets and the respective multiple sheetname they consist inside them. By making a GET API call on the designed logic app and passing the different sheetnames in every call as the GET API parameter will leads to the dynamic functioning of the pipelines.
Let’s Begin the practical Implementation now.
Following are resources that are required during this implementaion :
1.) Google Spreadsheet : One google spreadsheet with one or more than one data sheets inside it.
2.) Azure Logic App : Azure Logic Apps is a cloud service that helps you automate and orchestrate tasks, business processes, and workflows when you need to integrate apps, data, systems, and services across enterprises or organizations.
3.) Azure Data Factory (ADF) : Azure Data Factory (ADF) is a fully managed, serverless data integration solution for ingesting, preparing, and transforming all your data at scale. Enterprise Connectors to any Data Stores — Azure Data Factory enables organizations to ingest data from a rich variety of data sources.
4.) Azure Data Lake Storage (ADLS) Gen2 : Data Lake Storage Gen2 makes Azure Storage the foundation for building enterprise data lakes on Azure. Designed Blob storage and enhances performance, management, and security. The service is designed to primarily provide large cloud storage capacity to the user organization along with certain low-cost Azure Blob Storage capabilities.
Moving towards the technical part and the Implementation :
Configuring Logic Apps :
Step 1 : I already created a Logic App name “Testing Sheets” inside the Resource Group “RG-Test” and now we’ll move forward to design a logic function. I’m starting by adding a HTTP Request Activity along by selecting the GET mode and adding a relative parameter named “sheetname” (for dynamic functioning). This will help us in calling the Logic app like a web/api source form other services(ADF in our case).
Step 2 : The next Activity we are going to add here is the Google sheets(especially Get Rows activity for complete data) connector and for make it in use we need to authenticate with the respective google account (having all the necessary permissions) by putting the correct login credentials in the connecter (as shown below).
Here we first select the spreadsheet sheetname (“Testing Sheets” in our case) and then the worksheet name inside “testing Shees” from where we are going to read the Data.
Note : The worksheet name is going to change as it’s dynamically assigned by the “sheetname” parameter created in Step 1.
Step 3 : The output from the Get Rows Activity configured in Step 2 will result out in json and I’m maintaining the same format while sink, but incase you want to change the format like Delimited Text then you can simply configure that as well by adding create csv activities. But for this implementation we are using json only.
Step 4 : The last activity in the stream is our sink location i.e. ADLS Gen2. For dumping our data in this case I’m adding the Storage account connector and will configure as required. In my case I already have one existing ADLS gen2 account that can be referred in this logic app Storage connector.
As shown above, I simply configured the activity by specifying the Connecting String, the container name, the file name (which will change dynamically according to the worksheet name), the data that we receive as an output from the Google Sheets (Get rows) connector.
4.2 Configuring Azure Data Factory :
Now I successfully designed the logic app part for our approach, next we are moving towards the ADF for configuring it in a way so that it can call the logic apps as an API source will initiate the Logic App Function. While triggering the logic app function form ADF we need to pass the “sheetname” parameter which is created in the previous section.
Step 1 : First for Dynamically passing the worksheet name I’m using a config file which consist all the sheet names for which we want to process the data as shown :
Here, we are using a Lookup activity to read that particular config file (which is also present in ADLS Gen2 Account), which results out as a json “value” array as shown above.
Step 2 : The “value” array we receive in the 1st strep as output of lookup is going to be consumed by a foreach loop connected to the lookup which will iterate the sheetnames one by one and pass them as the URL parameter which calling the logic app (as an API) from ADF.
Step 3 : Inside the for-each loop we are using a Web activity for calling our logic app for every new sheetname parallelly. The URL section in the web activity is configured with dynamic input which consist of URL & the respective sheetname for that particular iteration as shown below :
Step 4 : Now we are good to go and we simply need to trigger the pipeline. After triggering the pipeline we can witness the whole designed process in the “Logic apps run History” which will show you how the logic apps triggers form the ADF call for every new worksheet and then process and the move the data into the ADLS Gen2 as configured earlier.
We successfully implemented the integration of the Google Spreadsheets Data to ADLS Gen2 account with logic apps. Incase we need to process bulk files we just need to put this whole designed pipeline in another for-each activity(under execute pipeline) and can pass the all the spreadsheet names in the similar way. The solution gives us the power of working with the combination of ADF and Azure Logic apps, especially in case of some advance connectors.
I hope you enjoyed this scenario, i’ll came up with multiple different scenarios in future posts regarding Azure Data Factory and for any further queries and Suggestions you can reach me out via mail at : email@example.com