Merge PDF files in SharePoint using an Azure Function

In this article, I will show you how to create an Azure Function to merge PDF files stored in SharePoint. The Function will be a generic service, which receives a list of file paths to merge. This means that you can trigger a request from SPFx, Power Automate, Logic Apps… Or anything else really. we are going to use the PFDsharp library, so our code will be super simple!

The inconvenience of Power Automate

If you are familiar with Power Automate, you may already know that you can use third-party actions to merge PDF files. But they may impose some significant disadvantages that can prevent you from using them:

  • License costs – third-party providers will typically charge you per user or for the number of executions
  • Data transfer – when using a remote service to merge your files, you are sending your data to that service. While this may not be a problem for files without commercial value, the same is not true for confidential information. The service provider may have strict security arrangements in place, but sometimes, the risk may be just too high.

Function advantages

Using an Azure Function, ultimately means that you are in full control. When compared with the inconveniences of third-party solutions in Power Automate:

  • Cost – the merging process is super fast, which means that you can use a consumption plan for the function. Yes, it’s almost FREE!
  • Data transfer – The information will flow between your Office 365 tenant and your Azure subscription. And all can be done using memory streams, so no temporary files to be deleted at the end.

Merge PDF files

NuGet packages

  • PDFsharp
  • SharePointPnPCoreOnline

Code

The code to merge files is actually very simple. First, we create a class that will represent a request to the function. In my case, I used an Azure storage queue as the entry point for my function and the messages of the queue had to respect this interface.

    internal class QueueItem
    {
        public string SiteUrl { get; set; }
        public string FolderPath { get; set; }
        public string FileName { get; set; }
        public string[] FilesPathArray { get; set; }
    }

And I have created a method that does all the work:

internal static async void MergePDFs(ClientContext ctx, QueueItem queueItem, TraceWriter log)
        {
            log.Info($"Creating blank PDF file...");
            // instantiate new file
            using (PdfDocument targetDoc = new PdfDocument())
            {
                Microsoft.SharePoint.Client.File file = null;
                ClientResult<Stream> fileStream = null;
                // parse all files in array
                log.Info($"Parsing {queueItem.FilesPathArray.Length} PDF files");
                foreach (string filePath in queueItem.FilesPathArray)
                {
                    log.Info($"Parsing PDF file: {filePath}");
                    // get file from SharePoint
                    file = ctx.Web.GetFileByUrl(filePath);
                    fileStream = file.OpenBinaryStream();
                    ctx.Load(file);
                    await ctx.ExecuteQueryRetryAsync();

                    // open file and get pages
                    using (PdfDocument pdfDoc = PdfReader.Open(fileStream.Value, PdfDocumentOpenMode.Import))
                    {
                        for (int i = 0; i < pdfDoc.PageCount; i++)
                        {
                            targetDoc.AddPage(pdfDoc.Pages[i]);

                        }
                    }
                }
                log.Info($"PDF files parsed successfully");

                // create result file
                using (Stream newFileStream = new MemoryStream())
                {                    
                    targetDoc.Save(newFileStream);

                    // upload to SharePoint
                    var destinationFolder = ctx.Web.GetFolderByServerRelativeUrl(queueItem.FolderPath);
                    ctx.Load(destinationFolder);
                    await ctx.ExecuteQueryRetryAsync();

                    destinationFolder.UploadFile(queueItem.FileName, newFileStream, true);
                    await ctx.ExecuteQueryRetryAsync();
                    log.Info($"Final PDF file added to SharePoint: {queueItem.FolderPath}/{queueItem.FileName}");
                }
            }
        }

Now, in your main Function file, simply:
– deserialize the queue message,
– instantiate the SharePoint context (PnP Core package make authentication really simple)
– call the MergePDFs function.

QueueItem queueItem = JsonConvert.DeserializeObject<QueueItem>(myQueueItem);

using (ClientContext ctx = new AuthenticationManager().GetAppOnlyAuthenticatedContext(siteUrl, authId, authSecret))
{
    MergePDFs(ctx, queueItem, log);
}

It’s this simple! Now to use the service, just send it an object that matches the following format

{
  "SiteUrl": "https://contoso.sharepoint.com/sites/testsite",
  "FolderPath": "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test",
  "FileName": "MergeResult.pdf",
  "FilesPathArray": [
    "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test/file1.pdf",
    "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test/file2.pdf"
  ]
}

Leave a Reply