How To Implement Batch Upload And Processing Of PDF Files With ASP.NET Core

Contents

1. Receive the Upload File
2. Extract the ZIP File
3. Read Each PDF File in ZIP
4. Read the PDF File Field’s Value
5. Save the Value into MSSQL Database
Oh hi there 👋It’s nice to meet you.
Sign up to receive awesome content in your inbox.
- Oh hi there 👋 It’s nice to meet you.

In this tutorial, we will walk through the steps to implement a batch upload and processing of PDF files with ASP.NET Core. We will allow clients to upload a ZIP file containing multiple PDF files, extract the ZIP file, read each PDF file’s field values, and finally save the values into a MSSQL database.

1. Receive the Upload File

The first step is to create an API endpoint that allows clients to upload a ZIP file containing multiple PDF files. We can use the IFormFile interface provided by ASP.NET Core to handle file uploads. Here is an example code snippet:

[HttpPost("upload")]
public async Task Upload(IFormFile file)
{
    // validate file extension and content type

    // save file to disk

    return Ok();
}

2. Extract the ZIP File

Once we have received the ZIP file, we need to extract it to a temporary directory. We can use the System.IO.Compression namespace to extract the ZIP file. Here is an example code snippet:

Do you want to be a good trading in cTrader? >> TRY IT! <<

using System.IO.Compression;

...

var tempDir = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName());
ZipFile.ExtractToDirectory(zipFilePath, tempDir);

3. Read Each PDF File in ZIP

Now that we have extracted the ZIP file, we can loop through each PDF file in the directory and read its field values. We can use the iTextSharp library to read the field values. Here is an example code snippet:

using iTextSharp.text.pdf;

...

foreach (var pdfFile in Directory.GetFiles(tempDir, "*.pdf"))
{
    var pdfReader = new PdfReader(pdfFile);
    var pdfFormFields = pdfReader.AcroFields.Fields;

    foreach (var pdfFormField in pdfFormFields)
    {
        var fieldValue = pdfReader.AcroFields.GetField(pdfFormField.Key);
        // save fieldValue to database
    }
}

4. Read the PDF File Field’s Value

As mentioned in the previous step, we can use the iTextSharp library to read the field values of each PDF file. We can loop through each field and get its value using the GetField method of the AcroFields class.

5. Save the Value into MSSQL Database

Finally, we can save the field values into a MSSQL database. We can use the System.Data.SqlClient namespace to connect to the database and execute SQL commands. Here is an example code snippet:

using System.Data.SqlClient;

...

var connectionString = "Data Source=MyServer;Initial Catalog=MyDatabase;User ID=MyUsername;Password=MyPassword;";
using var connection = new SqlConnection(connectionString);
await connection.OpenAsync();

foreach (var pdfFormField in pdfFormFields)
{
    var fieldValue = pdfReader.AcroFields.GetField(pdfFormField.Key);
    var sqlCommand = new SqlCommand("INSERT INTO MyTable (FieldName, FieldValue) VALUES (@FieldName, @FieldValue)", connection);
    sqlCommand.Parameters.AddWithValue("@FieldName", pdfFormField.Key);
    sqlCommand.Parameters.AddWithValue("@FieldValue", fieldValue);
    await sqlCommand.ExecuteNonQueryAsync();
}

And that’s it! We have implemented a batch upload and processing of PDF files with ASP.NET Core.