Generating PDFs in Javascript for fun and profit!

  • 2019-04-05 01:46 AM
  • 67

Up until recently, creating complex or elegant PDFs in Javascript has been challenging. Here I’m going to show you step-by-step the path of least resistance to beautiful PDFs. Spoiler: recently made possible by docx to PDF conversion in Javascript :-)

What follows is some of what I will cover in my upcoming talk at PDF Association conference in Seattle in June.

From 1000 feet, here are your three main alternatives:

  • The first is to create the PDF directly, using pdfkit, jsPDF, or the higher level pdfmake. Pdfkit is like iText in the Java world. Pdfmake, based on pdfkit, has its own format for representing rich text; it converts this to PDF.
  • The second is to create HTML, then convert that to PDF. These days probably using puppeteer.
  • The third is to create a docx, then convert that to PDF.

Put another way, you can either create the PDF directly, or use HTML or docx as an intermediate format.

Since its now easy to convert docx to PDF in Javascript, the docx approach is the path of least resistance — particularly for business documents (proposals, invoices, contracts etc).

For one thing, often the content will already be in Word document format, making your job easy.

More importantly, its worth thinking up front about ongoing maintenance (changes to content and formatting). Is that something that you as a developer want to be doing, or is it better to enable the business to do this themselves? If its a Word document, then business users can update the document without troubling you.

Creating a docx in Javascript has been easy for some years, but until recently, converting it to PDF from Javascript has the sticking point. Happily, this is now do-able — without invoking some SAAS API, using LibreOffice, or anything like that.

With docx.js you can programmatically build up your Word document (much like pdfkit and jsPDF allow you to build up a PDF). But this probably isn’t a great idea, because for the final PDF to come out looking right, any feature you care to use has to be supported in both the create-docx and docx-to-pdf steps. For example, merged cells in a table, or adding a watermark.

What we want is an easy way to create a docx, and then the confidence that our docx will be converted cleanly to PDF.

For this, a “templating” approach is the answer: basically, you create a docx template with your wanted layout - in Microsoft Word, LibreOffice, Google Docs, Native Documents or whatever - then use the template engine to replace “variables”.

Step 1: populate docx template

Here we’ll use docxtemplater, in node.js.

Say you want a PDF invoice. Since part of the point of using a Word template is that it is easy for business users to make it pretty, let’s start with one of the invoice templates designed by Microsoft and included in Word.

pdf

You can see I’ve added some variables (represented with curly braces, as required by docxtemplater).

You can click the image to see the docx in our Word File Editor. Click invoice-template.docx to download/use it with the code which follows.

Being a Javascript library, docxtemplater ingests data in JSON format:

{CustomerName : "Microsoft Corp",
AddressLine1: "One Microsoft Way",
City: "Redmond", State: "WA", Zip: "98052",
Country: "USA",
InvoiceDate : "14 March 2019",
InvoiceNumber: "INV123",
Items: [
{
 Item_Description: "Bananas",
 Item_Price: "5",
 Item_Qty: "10",
 Item_SubTotal: "50"
 },
{
 Item_Description: "Mangoes",
 Item_Price: "10",
 Item_Qty: "4",
 Item_SubTotal: "40"
 }
],
TotalEx:"90", 
SalesTax:"9", 
Shipping:"10", 
TotalPrice:"$109", 
DueDate : "28 March 2019"
}

Notice the Items array. The table row repeats for each of the Items. You can see docxtemplater’s markup for a repeat/loop at the start and end of that table row.

For demo purposes here we’ll provide that inline in our javascript:

var JSZip = require('jszip');
var Docxtemplater = require('docxtemplater');

var fs = require('fs');
var path = require('path');

//Load the docx file as a binary
var content = fs
    .readFileSync(path.resolve(__dirname, 'invoice-template.docx'), 'binary');

var zip = new JSZip(content);

var doc = new Docxtemplater();
doc.loadZip(zip);

//set the templateVariables
doc.setData({CustomerName : "Microsoft Corp",
AddressLine1: "One Microsoft Way",
City: "Redmond",  State: "WA",  Zip: "98052",
Country: "USA",

InvoiceDate : "14 March 2019",
InvoiceNumber: "INV123",
Items: [
{
	Item_Description: "Bananas",
	Item_Price: "5",
	Item_Qty: "10",
	Item_SubTotal: "50"
	},
{
	Item_Description: "Mangoes",
	Item_Price: "10",
	Item_Qty: "4",
	Item_SubTotal: "40"
	}
],
TotalEx:"90", 
SalesTax:"9", 
Shipping:"10",     
TotalPrice:"$109", 
DueDate : "28 March 2019"
});

try {
    // render the document ie replace the variables
    doc.render()
}
catch (error) {
    var e = {
        message: error.message,
        name: error.name,
        stack: error.stack,
        properties: error.properties,
    }
    console.log(JSON.stringify({error: e}));
    throw error;
}

var buf = doc.getZip()
             .generate({type: 'nodebuffer'});

// buf is a nodejs buffer, you can either write it to a file or do anything else with it.
fs.writeFileSync(path.resolve(__dirname, 'invoice-instance.docx'), buf);

To try it, install docxtemplater as per its instructions:

npm install docxtemplater
npm install [email protected]

Then its just:

node invoice-template-docx.js

And you get a populated invoice instance:

pdf

Notice the table row has been repeated, and all variables replaced.

If you run the code yourself, you can verify the results by opening invoice-instance.docx in your favourite docx editor, or in ours: click here then drag/drop your docx.

Step 2: convert the docx to PDF

So far so good. Now we just need to convert the populated invoice instance to PDF.

For that, we’ll use docx-wasm, a node module we at Native Documents released earlier this year. Our bread and butter at Native Documents is the web-based document editing/viewing component we used above to display invoice-template.docx, and this node module generates PDF output using that Word compatible page layout code. Put another way, the page layout reproduces what Word does so closely that it can also be used for high quality PDF output.

First, install it:

npm install @nativedocuments/docx-wasm

Converting the docx in the node.js buffer object to PDF is then just:

const docx = require("@nativedocuments/docx-wasm");

// init docx engine
docx.init({
ND_DEV_ID: "4H2I80DDEVNAJQSGGIC3K98N8S",
ND_DEV_SECRET: "3CTNJA7DBQFA8UDV2GM8I60N38",
    // ND_DEV_ID: "XXXXXXXXXXXXXXXXXXXXXXXXXX",    // goto https://developers.nativedocuments.com/ to get a dev-id/dev-secret
    // ND_DEV_SECRET: "YYYYYYYYYYYYYYYYYYYYYYYYYY", // you can also set the credentials in the enviroment variables
    ENVIRONMENT: "NODE", // required
    LAZY_INIT: true      // if set to false the WASM engine will be initialized right now, usefull pre-caching (like e.g. for AWS lambda)
}).catch( function(e) {
    console.error(e);
});
 
async function convertHelper(document, exportFct) {
    const api = await docx.engine();
    await api.load(document);
    const arrayBuffer = await api[exportFct]();
    await api.close();
    return arrayBuffer;
}
 
convertHelper(buf, "exportPDF").then((arrayBuffer) => {
    fs.writeFileSync("sample.pdf", new Uint8Array(arrayBuffer));
}).catch((e) => {
    console.error(e);
});

You’ll need a ND_DEV_ID, ND_DEV_SECRET pair to use this module. You can get free-tier keys at https://developers.nativedocuments.com/

Copy these into the docx.init call (or alternatively, you can set these as environment vars).

I haven’t posted the PDF here, since it just looks the same as the invoice-instance docx.

Putting it all together

Here is Javascript which combines step 1 and step 2.

// Step 1: generate docx

var JSZip = require('jszip');
var Docxtemplater = require('docxtemplater');

var fs = require('fs');
var path = require('path');

//Load the docx file as a binary
var content = fs
    .readFileSync(path.resolve(__dirname, 'template.docx'), 'binary');

var zip = new JSZip(content);

var doc = new Docxtemplater();
doc.loadZip(zip);

//set the templateVariables
doc.setData({CustomerName : "Microsoft Corp",
AddressLine1: "One Microsoft Way",
City: "Redmond",  State: "WA",  Zip: "98052",
Country: "USA",

InvoiceDate : "14 March 2019",
InvoiceNumber: "INV123",
Items: [
{
	Item_Description: "Bananas",
	Item_Price: "5",
	Item_Qty: "10",
	Item_SubTotal: "50"
	},
{
	Item_Description: "Mangoes",
	Item_Price: "10",
	Item_Qty: "4",
	Item_SubTotal: "40"
	}
],
TotalEx:"90", 
SalesTax:"9", 
Shipping:"10",     
TotalPrice:"$109", 
DueDate : "28 March 2019"
});

try {
    // render the document ie replace the variables
    doc.render()
}
catch (error) {
    var e = {
        message: error.message,
        name: error.name,
        stack: error.stack,
        properties: error.properties,
    }
    console.log(JSON.stringify({error: e}));
    throw error;
}

var buf = doc.getZip()
             .generate({type: 'nodebuffer'});

// buf is a nodejs buffer, you can either write it to a file or do anything else with it.
//fs.writeFileSync(path.resolve(__dirname, 'output.docx'), buf);

// Step 2: convert docx to pdf

const docx = require("@nativedocuments/docx-wasm");

// init docx engine
docx.init({
    // ND_DEV_ID: "XXXXXXXXXXXXXXXXXXXXXXXXXX",    // goto https://developers.nativedocuments.com/ to get a dev-id/dev-secret
    // ND_DEV_SECRET: "YYYYYYYYYYYYYYYYYYYYYYYYYY", // you can also set the credentials in the enviroment variables
    ENVIRONMENT: "NODE", // required
    LAZY_INIT: true      // if set to false the WASM engine will be initialized right now, usefull pre-caching (like e.g. for AWS lambda)
}).catch( function(e) {
    console.error(e);
});
 
async function convertHelper(document, exportFct) {
    const api = await docx.engine();
    await api.load(document);
    const arrayBuffer = await api[exportFct]();
    await api.close();
    return arrayBuffer;
}
 
convertHelper(buf, "exportPDF").then((arrayBuffer) => {
    fs.writeFileSync("output.pdf", new Uint8Array(arrayBuffer));
}).catch((e) => {
    console.error(e);
});

To try it, download invoice-template.docx then:

node docx-template-to-pdf.js

Deployment Options

A nice way to run this is on AWS Lambda. With Lambda, you get easy scalability, and you aren’t paying for servers when you aren’t using them. More on this in my upcoming talk at PDF Association conference in Seattle in June! In the meantime, docx-to-pdf-on-AWS-Lambda shows you how to do the docx to PDF part on Lambda. Adding the docx templating piece is straightforward.

Its also now possible to convert docx to PDF client-side, in-browser, reducing server loads, and opening the way to offline operation. docx-wasm-client-side shows you how to do the docx to PDF part client-side.

Originally published by Jason Harrop at https://hackernoon.com

Learn More

The Complete JavaScript Course 2019: Build Real Projects!
Become a JavaScript developer - Learn (React, Node,Angular)
JavaScript: Understanding the Weird Parts
Vue JS 2 - The Complete Guide (incl. Vue Router & Vuex)
The Full JavaScript & ES6 Tutorial - (including ES7 & React)
JavaScript - Step By Step Guide For Beginners
The Web Developer Bootcamp
MERN Stack Front To Back: Full Stack React, Redux & Node.js
Visual Studio Code Settings and Extensions for Faster JavaScript Development
Vue.js Authentication System with Node.js Backend

Suggest