Convert Microsoft Office Documents into PDF using Microsoft Graph & Azure Functions

One of my recent task was to translate a word document into a pdf from a web application hosted in Azure Web App – therefore the code has to process at the Azure side.

Initially, I thought a traditional approach would work such as Interop or some free Api easily available on the net, however to my great surprise the code was throwing the following exception A generic error occurred in GDI+.

Exception handling in Net: Advanced exceptions | Hexacta

What I learned from this is that all Azure Web Apps (as well as Mobile App/Services, WebJobs, and Functions) run in a secure environment called a sandbox. Each app runs inside its own sandbox, isolating its execution from other instances on the same machine as well as providing an additional degree of security and privacy that would otherwise not be available. The sandbox mechanism aims to ensure that each app running on a machine will have a minimum guaranteed level of service; furthermore, the runtime limits enforced by the sandbox protect apps from being adversely affected by other resource-intensive apps which may be running on the same machine.

The sandbox generally aims to restrict access to shared components of Windows. Unfortunately, many core components of Windows have been designed as shared components: the registry, cryptography, and graphics subsystems, among others. For the sake of radical attack surface area reduction, the sandbox prevents almost all of the Win32k.sys APIs from being called, which practically means that most of User32/GDI32 system calls are blocked. For most applications, this is not an issue since most Azure Web Apps do not require access to Windows UI functionality (they are web applications after all). Since all the major libraries use a lot of GDI calls during the PDF conversion, the default rendering engine does not work on Azure Web Apps. You can find more information about those sandbox restrictions on https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#win32ksys-user32gdi32-restrictions.

So now the solution is to find an approach to convert the PDF within Azure – luckily I came across a blog from Philipp Bauknecht which is leveraging Microsoft Graph to convert a document to PDF – let us see how.

There are several steps, which you have to perform in the correct order:

  1. Create an App registration in Azure AD and assign the required permissions
  2. Create a new Azure Functions app using Visual Studio 2019
  3. Create an OAuth2 authentication service to request an access token to call the Microsoft Graph
  4. Create a File Service to upload, convert and delete files using the Microsoft Graph
  5. Setup Dependency Injection
  6. Create a new function as the Main entry point
  7. Create a Function App in Azure to host the code and make it available globally
  8. Import the publish profile & deploy using Visual Studio 2019
  9. Test using a Console Application c#
  10. Test using Postman

Step 1: Create an App registration in Azure AD and assign the required permissions

1.1 Go to https://portal.azure.com, then Azure Active Directory and select App Registrations; Click on New registration, provide a name then click on Register

1.2 Once the app is provisioned, on the left navigation blade click on Certificates & secrets; Click on New client secret to create one, then save the value of the secret for later use.

1.3 Go to API permissions, then click on Add a permission then Microsoft Graph, then choose Application permissions to add the following permissions (Admin consent is a must):

1.4 Go to Overview and save the values of Application (client) Id and Directory (tenant) Id for later use.

Step 2: Create a new Azure Functions app using Visual Studio 2019

Open Visual Studio 2019 and Create a new project in which choose Azure Functions

Step 3: Create an OAuth2 authentication service to request an access token to call the Microsoft Graph

This class is responsible to get the access token.

using Microsoft.Extensions.Options;
using Newtonsoft.Json;
using System.Collections.Generic;
using System.Net.Http;
using System.Threading.Tasks;

namespace PdfConversionFunctionApp
{
    public class AuthenticationService
    {
        public static async Task<string> GetAccessTokenAsync(ApiConfig _apiConfig)
        {
            var values = new List<KeyValuePair<string, string>>
            {
                new KeyValuePair<string, string>("client_id", _apiConfig.ClientId),
                new KeyValuePair<string, string>("client_secret", _apiConfig.ClientSecret),
                new KeyValuePair<string, string>("scope", _apiConfig.Scope),
                new KeyValuePair<string, string>("grant_type", _apiConfig.GrantType),
                new KeyValuePair<string, string>("resource", _apiConfig.Resource)
            };
            var client = new HttpClient();
            var requestUrl = $"{_apiConfig.Endpoint}{_apiConfig.TenantId}/oauth2/token";
            var requestContent = new FormUrlEncodedContent(values);
            var response = await client.PostAsync(requestUrl, requestContent);
            var responseBody = await response.Content.ReadAsStringAsync();
            dynamic tokenResponse = JsonConvert.DeserializeObject(responseBody);
            return tokenResponse?.access_token;
        }
    }
}

Step 4: Create a File Service to upload, convert and delete files using the Microsoft Graph

This class is responsible to upload, convert and delete the file.

using Newtonsoft.Json;
using System;
using System.IO;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Threading.Tasks;

namespace PdfConversionFunctionApp
{
    public class FileService
    {
        private readonly ApiConfig _apiConfig;
        private HttpClient _httpClient;

        public FileService(ApiConfig apiConfig)
        {
            _apiConfig = apiConfig;
        }

        private async Task<HttpClient> CreateAuthorizedHttpClient()
        {
            if (_httpClient != null)
            {
                return _httpClient;
            }

            var token = await AuthenticationService.GetAccessTokenAsync(_apiConfig); 
            _httpClient = new HttpClient();
            _httpClient.DefaultRequestHeaders.Add("Authorization", $"Bearer {token}");

            return _httpClient;
        }

        public async Task<string> UploadStreamAsync(string path, Stream content, string contentType)
        {
            var httpClient = await CreateAuthorizedHttpClient();

            string tmpFileName = $"{Guid.NewGuid().ToString()}{MimeTypes.MimeTypeMap.GetExtension(contentType)}";
            string requestUrl = $"{path}root:/{tmpFileName}:/content";
            var requestContent = new StreamContent(content);
            requestContent.Headers.ContentType = new MediaTypeHeaderValue(contentType);
            var response = await httpClient.PutAsync(requestUrl, requestContent);
            if (response.IsSuccessStatusCode)
            {
                dynamic file = JsonConvert.DeserializeObject(await response.Content.ReadAsStringAsync());
                return file?.id;
            }
            else
            {
                var message = await response.Content.ReadAsStringAsync();
                throw new Exception($"Upload file failed with status {response.StatusCode} and message {message}");
            }
        }

        public async Task<byte[]> DownloadConvertedFileAsync(string path, string fileId, string targetFormat)
        {
            var httpClient = await CreateAuthorizedHttpClient();

            var requestUrl = $"{path}{fileId}/content?format={targetFormat}";
            var response = await httpClient.GetAsync(requestUrl);
            if (response.IsSuccessStatusCode)
            {
                var fileContent = await response.Content.ReadAsByteArrayAsync();
                return fileContent;
            }
            else
            {
                var message = await response.Content.ReadAsStringAsync();
                throw new Exception($"Download of converted file failed with status {response.StatusCode} and message {message}");
            }
        }

        public async Task DeleteFileAsync(string path, string fileId)
        {
            var httpClient = await CreateAuthorizedHttpClient();

            var requestUrl = $"{path}{fileId}";
            var response = await httpClient.DeleteAsync(requestUrl);
            if (!response.IsSuccessStatusCode)
            {
                var message = await response.Content.ReadAsStringAsync();
                throw new Exception($"Delete file failed with status {response.StatusCode} and message {message}");
            }
        }
    }
}

Step 5: Setup Dependency Injection

5.1 In order to use the FileService and the Configuration properties (local & in Azure), we need to set dependency injection. To use dependency injection in Azure Function app we need to add the package Microsoft.Azure.Functions.Extensions to our app using Nuget.

using Microsoft.Azure.Functions.Extensions.DependencyInjection;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using System;
using System.IO;
using System.Reflection;

[assembly: FunctionsStartup(typeof(PdfConversionFunctionApp.Startup))]
namespace PdfConversionFunctionApp
{
    class Startup : FunctionsStartup
    {
        public override void Configure(IFunctionsHostBuilder builder)
        {
            var fileInfo = new FileInfo(Assembly.GetExecutingAssembly().Location);
            string path = fileInfo.Directory.Parent.FullName;
            var config = new ConfigurationBuilder()
                .SetBasePath(Environment.CurrentDirectory)
                .SetBasePath(path)
                .AddJsonFile("local.settings.json", optional: true, reloadOnChange: true)
                .AddEnvironmentVariables()
                .Build();

            var apiConfig = new ApiConfig();
            config.Bind(nameof(ApiConfig), apiConfig);

            builder.Services.AddSingleton<FileService>();
            builder.Services.AddSingleton(apiConfig);
        }
    }
}

The above code – from line 15 to 25 – takes care of getting the configuration values, if the app runs locally then it loads the local.settings.json, otherwise, it takes the values from the Azure Function Application settings (see Step 7.2)

5.2 Now set the values of TenantId, ClientId & ClientSecret from Step 1; The SiteId correspond to the Document Library where the file will get temporarily uploaded, we will have to GET it using Microsoft Graph Explorer with the following formula:

This is how the local.settings.json looks:

https://graph.microsoft.com/v1.0/sites/{hostname}:/sites/{path}?$select=id
GET => https://graph.microsoft.com/v1.0/sites/myorganization.sharepoint.com?$select=id
GET => https://graph.microsoft.com/v1.0/sites/myorganization.sharepoint.com:/sites/Contoso/Operations/Manufacturing?$select=id
Response => 
{
    "@odata.context": "https://graph.microsoft.com/v1.0/$metadata#sites(id)/$entity",
    "id": "myorganization.sharepoint.com,74796aa9-17f6-4c09-9b20-1d78bfdcbac4,98f692fe-ea45-423b-8001-0b9c6bb2b50f"
}
What you get back in the id is in this format: {hostname},{spsite.id},{spweb.id}. 
What we need is then the {spsite.id} which is 74796aa9-17f6-4c09-9b20-1d78bfdcbac4
{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "FUNCTIONS_WORKER_RUNTIME": "dotnet",
    "graph:Endpoint": "https://login.microsoftonline.com/",
    "graph:GrantType": "client_credentials",
    "graph:Scope": "Files.ReadWrite.All",
    "graph:Resource": "https://graph.microsoft.com",
    "graph:TenantId": "",
    "graph:ClientId": "",
    "graph:ClientSecret": "",
    "pdf:GraphEndpoint": "https://graph.microsoft.com/v1.0/",
    "pdf:SiteId": ""
  },
  "ApiConfig": {
    "Endpoint": "https://login.microsoftonline.com/",
    "GrantType": "client_credentials",
    "Scope": "Files.ReadWrite.All",
    "Resource": "https://graph.microsoft.com",
    "TenantId": "",
    "ClientId": "",
    "ClientSecret": "",
    "GraphEndpoint": "https://graph.microsoft.com/v1.0/",
    "SiteId": ""
  }
}

Step 6: Create a new function as the Main entry point

Add a new function to your project and name it ConvertToPdf. Select the Http trigger so our function can be called via a http request and pick Authorization level Anonymous so we don’t need to provide any credentials when calling this function; Replace the below code

using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Extensions.Logging;
using System.Threading.Tasks;

namespace PdfConversionFunctionApp
{
    public class ConvertToPdf
    {
        private readonly FileService _fileService;
        private readonly ApiConfig _apiConfig;

        public ConvertToPdf(FileService fileService, ApiConfig apiConfig)
        {
            _fileService = fileService;
            _apiConfig = apiConfig;
        }

        [FunctionName("ConvertToPdf")]
        public async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = null)] HttpRequest req, ILogger log)
        {
            if (req.Headers.ContentLength == 0)
            {
                log.LogInformation("Please provide a file.");
                return new BadRequestObjectResult("Please provide a file.");
            }

            var path = $"{_apiConfig.GraphEndpoint}sites/{_apiConfig.SiteId}/drive/items/";

            var fileId = await _fileService.UploadStreamAsync(path, req.Body, req.ContentType);

            var pdf = await _fileService.DownloadConvertedFileAsync(path, fileId, "pdf");

            await _fileService.DeleteFileAsync(path, fileId);

            return new FileContentResult(pdf, "application/pdf");
        }
    }
}

Step 7: Create a Function App in Azure to host the code and make it available globally

7.1 Go to https://portal.azure.com, then click on Create Function App

7.2 Once the app is provisioned, on the left navigation blade click on Configuration, then New application setting – we will have to add the below application settings which are needed when the app runs from Azure (the values as the same as step 5.2)

7.3 On the Overview section, download the publish profile while clicking on Get publish profile

Step 8: Import the publish profile & deploy using Visual Studio 2019

8.1 Right-click on Visual Studio, then choose Publish, import your publish settings to deploy your app from the file downloaded in the previous step – then deploy.

8.2 If Debugging is needed then we can use the Azure Function App Log Stream Monitoring features.

Step 9: Test using a Console Application c#

9.1 Create a console application and replace the following code.

using System;
using System.IO;
using System.Net;
using PdfConversionFunctionApp;

namespace PdfConversionConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            string filePathWord = @"C:\Temp\TestDocument.docx";
            string filePathOutWord = @"C:\Temp\TestDocument.pdf";

            string filePathExcel = @"C:\Temp\TestExcel.xlsx";
            string filePathOutExcel = @"C:\Temp\TestExcel.pdf";

            bool IsSuccessWord = ConverToPdf(filePathWord, filePathOutWord);
            bool IsSuccessExcel = ConverToPdf(filePathExcel, filePathOutExcel);
        }

        private static bool ConverToPdf(String filePath, String filePathOut)
        {
            try
            {
                //string urlLocal = "http://localhost:7071/api/ConvertToPdf";
                string urlAzure = "https://graphpdfconverter.azurewebsites.net/api/ConvertToPdf";

                HttpWebRequest req = (HttpWebRequest)WebRequest.Create(urlAzure);
                req.Method = "POST";

                string fileExtension = Path.GetExtension(filePath);
                switch (fileExtension)
                {
                    case ".doc":
                        req.ContentType = "application/msword";
                        break;
                    case ".docx":
                        req.ContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document";
                        break;
                    case ".xls":
                        req.ContentType = "application/vnd.ms-excel";
                        break;
                    case ".xlsx":
                        req.ContentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"; ;
                        break;
                    default:
                        throw new Exception("Only Word & Excel documents are supported by the Converter");
                }

                Stream fileStream = System.IO.File.Open(filePath, FileMode.Open);
                MemoryStream inputStream = new MemoryStream();
                fileStream.CopyTo(inputStream);
                fileStream.Dispose();
                Stream stream = req.GetRequestStream();
                stream.Write(inputStream.ToArray(), 0, inputStream.ToArray().Length);
                HttpWebResponse res = (HttpWebResponse)req.GetResponse();

                //Create file stream to save the output PDF file
                FileStream outStream = System.IO.File.Create(filePathOut);
                //Copy the responce stream into file stream
                res.GetResponseStream().CopyTo(outStream);
                //Dispose the input stream
                inputStream.Dispose();
                //Dispose the file stream
                outStream.Dispose();

                return true;
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
            return false;


        }
    }
}

9.2 To test and debug locally, Click F5 on the Function App – Visual Studio will provide a POST URL which you can use in the console to run & debug the code.

9.3 To run it from Azure, go to Azure Portal, then open your Azure Function App, on the left navigation blade click on Functions, click on the function name then Get Function Url. Use this URL in the console to convert the document to pdf.

It is important to mention that the Content Type will define the type of docunent to be converted – find the complete list of Common MIME types.

Step 10: Test using Postman

10.1 In Postman, add the Azure Function App Url (see step 9.3).

10.2 On the Header section, add the appropriate MIME Types

10.3 On the Body section, click on Binary and upload a file then click the Send button.

10.4 On successfull request, we can save the converted pdf file.

Summary

As we can see Microsoft Graph allows us to convert easily documents to pdf, that up to 1 million free calls, along with Azure Function it provides the flexibility to use these features anywhere anytime your users want.

Download the code from Github