Merge PDF files in SharePoint using an Azure Function

Need to merge PDF files stored in SharePoint? Look no further!
In this article, I will show you how to create an Azure Function to merge PDF files stored in SharePoint. The Function will be a generic service, which receives a list of file paths to merge. This means that you can trigger a request from SPFx, Power Automate, Logic Apps… Or anything else really. we are going to use the PFDsharp library, so our code will be super simple!

Update 01-02-2022

As I was getting some requests to publish the full solution, I asked my client and I am happy to announce that they have agreed to let me publish the source code!
Any feedback is welcome, and please let me know if you find issues.

Link to the full solution

The inconvenience of Power Automate

If you are familiar with Power Automate, you may already know that you can use third-party actions to merge PDF files. But they may impose some significant disadvantages that can prevent you from using them:

  • License costs – third-party providers will typically charge you per user or for the number of executions
  • Data transfer – when using a remote service to merge your files, you are sending your data to that service. While this may not be a problem for files without commercial value, the same is not true for confidential information. The service provider may have strict security arrangements in place, but sometimes, the risk may be just too high.

Function advantages

Using an Azure Function, ultimately means that you are in full control. When compared with the inconveniences of third-party solutions in Power Automate:

  • Cost – the merging process is super fast, which means that you can use a consumption plan for the function if you really want to. Yes, it’s almost FREE!
  • Data transfer – The information will flow between your Office 365 tenant and your Azure subscription. And all can be done using memory streams, so no temporary files stored that need to be deleted at the end.

Merge PDF files

NuGet packages

  • PDFsharp
  • SharePointPnPCoreOnline

Code

The code to merge files is actually very simple. First, we create a class that will represent a request to the function. In my case, I used an Azure storage queue as the entry point for my function and the messages of the queue had to respect this interface.

    internal class QueueItem
    {
        public string SiteUrl { get; set; }
        public string FolderPath { get; set; }
        public string FileName { get; set; }
        public string[] FilesPathArray { get; set; }
    }

And I have created a method that does all the work:

internal static async void MergePDFs(ClientContext ctx, QueueItem queueItem, TraceWriter log)
        {
            log.Info($"Creating blank PDF file...");
            // instantiate new file
            using (PdfDocument targetDoc = new PdfDocument())
            {
                Microsoft.SharePoint.Client.File file = null;
                ClientResult<Stream> fileStream = null;
                // parse all files in array
                log.Info($"Parsing {queueItem.FilesPathArray.Length} PDF files");
                foreach (string filePath in queueItem.FilesPathArray)
                {
                    log.Info($"Parsing PDF file: {filePath}");
                    // get file from SharePoint
                    file = ctx.Web.GetFileByUrl(filePath);
                    fileStream = file.OpenBinaryStream();
                    ctx.Load(file);
                    await ctx.ExecuteQueryRetryAsync();

                    // open file and get pages
                    using (PdfDocument pdfDoc = PdfReader.Open(fileStream.Value, PdfDocumentOpenMode.Import))
                    {
                        for (int i = 0; i < pdfDoc.PageCount; i++)
                        {
                            targetDoc.AddPage(pdfDoc.Pages[i]);

                        }
                    }
                }
                log.Info($"PDF files parsed successfully");

                // create result file
                using (Stream newFileStream = new MemoryStream())
                {                    
                    targetDoc.Save(newFileStream);

                    // upload to SharePoint
                    var destinationFolder = ctx.Web.GetFolderByServerRelativeUrl(queueItem.FolderPath);
                    ctx.Load(destinationFolder);
                    await ctx.ExecuteQueryRetryAsync();

                    destinationFolder.UploadFile(queueItem.FileName, newFileStream, true);
                    await ctx.ExecuteQueryRetryAsync();
                    log.Info($"Final PDF file added to SharePoint: {queueItem.FolderPath}/{queueItem.FileName}");
                }
            }
        }

Now, in your main Function file, simply:
– deserialize the queue message,
– instantiate the SharePoint context (PnP Core package make authentication really simple)
– call the MergePDFs function.

QueueItem queueItem = JsonConvert.DeserializeObject<QueueItem>(myQueueItem);

using (ClientContext ctx = new AuthenticationManager().GetAppOnlyAuthenticatedContext(siteUrl, authId, authSecret))
{
    MergePDFs(ctx, queueItem, log);
}

It’s this simple! Now to use the service, just send it an object that matches the following format

{
  "SiteUrl": "https://contoso.sharepoint.com/sites/testsite",
  "FolderPath": "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test",
  "FileName": "MergeResult.pdf",
  "FilesPathArray": [
    "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test/file1.pdf",
    "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test/file2.pdf"
  ]
}

14 Replies to “Merge PDF files in SharePoint using an Azure Function”

  1. Do you have a downloadable sample project for this. I am trying to create an azure function to test this and the new AuthenticationManager().GetAppOnlyAuthenticatedContext(siteUrl, authId, authSecret) keeps failing for me.

  2. Hi, this solution is very nice, but we cannot solve this error: 2020-11-19T15:13:59.432 [Error] Executed ‘Function1’ (Failed, Id=c7df2c89-44a2-43f0-a677-0f989f57ab20, Duration=766ms)Could not load type ‘System.Web.Configuration.WebConfigurationManager’ from assembly ‘System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’. Any help? Thanks.

  3. Hi, I deployed everything without any issue during deployment.
    But when I try to test the function using the sample your provided.
    I get the following error message:

    [Error] Executed ‘MergePDF’ (Failed, Id=XXXXXXXXXXXXXXXXX, Duration=3ms)The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.

    Sample used for testing.

    {
    “SiteUrl”: “https://contoso.sharepoint.com/sites/contoso”,
    “FolderPath”: “https://contoso.sharepoint.com/sites/contoso/Shared%20Documents/Test”,
    “FileName”: “result.pdf”,
    “FilesPathArray”: [
    “https://contoso.sharepoint.com/sites/contoso/Shared%20Documents/Test/file1.pdf”,
    “https://contoso.sharepoint.com/sites/contoso/Shared%20Documents/Test/file2.pdf”
    ]
    }

    1. Were you able to resolve this? All I can think is that it could be related to the details used for authentication

    1. Hi, sorry for the late reply. These were running on consumption plans, what issues did you experience?

  4. I see listed under Function Advantages that you listed the code as nearly free.

    I assume that’s because of the subscriptions through Microsoft to have the infrastructure to setup this Azure Function?

    Like, we don’t need to pay anything additional to run this Solution?

    1. Hi Tim, sorry for the late reply. Your assumption was correct and I was referring to the very small cost of running Azure Functions as this even works with consumption plan.

Leave a Reply

Your email address will not be published. Required fields are marked *