M365 Dev Blog

Merge PDF files in SharePoint using an Azure Function

Need to merge PDF files stored in SharePoint? Look no further!
In this article, I will show you how to create an Azure Function to merge PDF files stored in SharePoint. The Function will be a generic service, which receives a list of file paths to merge. This means that you can trigger a request from SPFx, Power Automate, Logic Apps… Or anything else really. we are going to use the PFDsharp library, so our code will be super simple!

Update 01-02-2022

As I was getting some requests to publish the full solution, I asked my client and I am happy to announce that they have agreed to let me publish the source code!
Any feedback is welcome, and please let me know if you find issues.

Link to the full solution

The inconvenience of Power Automate

If you are familiar with Power Automate, you may already know that you can use third-party actions to merge PDF files. But they may impose some significant disadvantages that can prevent you from using them:

Function advantages

Using an Azure Function, ultimately means that you are in full control. When compared with the inconveniences of third-party solutions in Power Automate:

Merge PDF files

NuGet packages

Code

The code to merge files is actually very simple. First, we create a class that will represent a request to the function. In my case, I used an Azure storage queue as the entry point for my function and the messages of the queue had to respect this interface.

    internal class QueueItem
    {
        public string SiteUrl { get; set; }
        public string FolderPath { get; set; }
        public string FileName { get; set; }
        public string[] FilesPathArray { get; set; }
    }

And I have created a method that does all the work:

internal static async void MergePDFs(ClientContext ctx, QueueItem queueItem, TraceWriter log)
        {
            log.Info($"Creating blank PDF file...");
            // instantiate new file
            using (PdfDocument targetDoc = new PdfDocument())
            {
                Microsoft.SharePoint.Client.File file = null;
                ClientResult<Stream> fileStream = null;
                // parse all files in array
                log.Info($"Parsing {queueItem.FilesPathArray.Length} PDF files");
                foreach (string filePath in queueItem.FilesPathArray)
                {
                    log.Info($"Parsing PDF file: {filePath}");
                    // get file from SharePoint
                    file = ctx.Web.GetFileByUrl(filePath);
                    fileStream = file.OpenBinaryStream();
                    ctx.Load(file);
                    await ctx.ExecuteQueryRetryAsync();

                    // open file and get pages
                    using (PdfDocument pdfDoc = PdfReader.Open(fileStream.Value, PdfDocumentOpenMode.Import))
                    {
                        for (int i = 0; i < pdfDoc.PageCount; i++)
                        {
                            targetDoc.AddPage(pdfDoc.Pages[i]);

                        }
                    }
                }
                log.Info($"PDF files parsed successfully");

                // create result file
                using (Stream newFileStream = new MemoryStream())
                {                    
                    targetDoc.Save(newFileStream);

                    // upload to SharePoint
                    var destinationFolder = ctx.Web.GetFolderByServerRelativeUrl(queueItem.FolderPath);
                    ctx.Load(destinationFolder);
                    await ctx.ExecuteQueryRetryAsync();

                    destinationFolder.UploadFile(queueItem.FileName, newFileStream, true);
                    await ctx.ExecuteQueryRetryAsync();
                    log.Info($"Final PDF file added to SharePoint: {queueItem.FolderPath}/{queueItem.FileName}");
                }
            }
        }

Now, in your main Function file, simply:
– deserialize the queue message,
– instantiate the SharePoint context (PnP Core package make authentication really simple)
– call the MergePDFs function.

QueueItem queueItem = JsonConvert.DeserializeObject<QueueItem>(myQueueItem);

using (ClientContext ctx = new AuthenticationManager().GetAppOnlyAuthenticatedContext(siteUrl, authId, authSecret))
{
    MergePDFs(ctx, queueItem, log);
}

It’s this simple! Now to use the service, just send it an object that matches the following format

{
  "SiteUrl": "https://contoso.sharepoint.com/sites/testsite",
  "FolderPath": "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test",
  "FileName": "MergeResult.pdf",
  "FilesPathArray": [
    "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test/file1.pdf",
    "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test/file2.pdf"
  ]
}

Exit mobile version