Deep learning to optimise the recovery of your document flows in Harmonie Communication Suite

One of Sefas’s historic businesses is the automated processing of large volumes of documents. Originally, Sefas software was intended to compose document processes (invoices, pay slips, etc.) but also to reengineer pre-composed documents from other sources in order to be converted and then transformed. These workflows used by our software were formatted for printing in the appropriate Print Description Language (PDL) such as PDF, AFP, PCL, PostScript.

Sefas embraced early concepts of multi-channel management by introducing capabilities to manage fax output, email attachments and simple text messaging applications.

Today market needs have evolved – embracing Customer Communication (CCM) and boosting Customer Experience Processes (CXP). Digital transformation and the need to have a harmonised and omnichannel approach have led Sefas to further their support of the digital and mobile world. Our technology, proven in the desktop publishing industry for nearly 30 years, is still used to design and produce paper / PDF documents as well as responsive design emails and web documents and SMS.

Digital transition involves not only the dematerialisation of physical applications, but also the ability to transform paper (in page, mail or PDF format) into email, web, SMS, etc. communications.

New omnichannel features integrated into Harmonie Communication Suite have led Sefas customers to want to centralise all CCM capabilities within our software solution. This implies the absorption of all types of content coming from a variety of sources (office automation, DTP tools, marketing, etc.). These less and less “standardised” sources either do not conform with market standards, or present variable elements that traditional tools have difficulty interpreting.

To meet this need, Sefas is exploring new methods for the mass processing of multi-source, multi-format, multi-channel documentary flows.

Two approaches are currently being studied within our R&D:

  • A rapid-thinking approach, aims to identify any non-compliance with market standards of content imported into our software, then to implement increasingly thorough (self-learning) self-correction mechanisms. For example where PDF formats are processed, which constitutes a significant majority of workflows processed by our customers, our software includes a module to “clean” any non-compliant elements. This function makes it possible to identify and correct on the fly any anomalies encountered (missing fonts, incorrect encodings, poorly referenced resources, etc.). The workflows thus become eligible for rapid deployment and easy scaling, while maintaining a high level of performance.
  • Next, is an “intelligent” approach, which uses new machine learning mechanisms (deep learning). Sefas is studying the integration of artificial intelligence functions, based on neural networks, inspired by the functioning of the human brain to learn. This technology will allow our document reengineering engines to learn to recognise structures that our software analyses, to further optimise processing speed and accuracy. This involves, for example, identifying a given document element (image, address block, table, etc.), present across all types of document in the system. Through machine learning, functionally similar (but not technically identical) resources can be reliably identified and modified, changed, deleted or moved in one process step across all applications if required.

There are many applications where Artificial Intelligence greatly improves the delivery of Sefas’s CCM solutions. Examples of which include:

  • The most obvious case is the factorisation (pooling) of the resources present individually in each document, so as to reduce the size of submitted workflows and improve production performance. For example, different documents sent by different users will all contain the same branded company logo, which can vary in size, position, and compression algorithms such as JPEG. This makes every occurrence of the logo technically different. With deep learning, it becomes possible to functionally identify this image and pool all occurrences.
  • Another use case is the ability to “recompose” the content of the various ingested streams. Identifying “rich” resources is usually not possible in a format such as PDF because it is intended for printing. Technically speaking, a PDF only contains text, images and vectors. With Artificial Intelligence, we can detect paragraphs, sections and tables, just like a human would. This makes it possible to improve the functional capture of content, and to consider propagating these functional semantic contents to digital formats, by a “recomposition” mechanism. This makes it possible to transform, en-masse, a “legacy” format into a digital format, without user interaction. Sefas has drastically improved the rate of resumption of PDF flows with various mechanisms. In addition to these autocorrection functions, our software includes intelligent OCR and rasterisation functions, which allow us to approach 100% conversion rate.

There are many other ways that AI and machine learning can, and will, improve the performance and accuracy of processing large, complex data-rich documents to support our customers’ ever-changing needs.

By Jean-Marie Bonnefont, CTO Sefas Innovation Ltd