Webbläsaren som du använder stöds inte av denna webbplats. Alla versioner av Internet Explorer stöds inte längre, av oss eller Microsoft (läs mer här: * https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Var god och använd en modern webbläsare för att ta del av denna webbplats, som t.ex. nyaste versioner av Edge, Chrome, Firefox eller Safari osv.

 Kerstin Enflo. Foto.

Kerstin Enflo

Professor

 Kerstin Enflo. Foto.

Joint Handwritten Text Recognition and Word Classification for Tabular Information Extraction

Författare

  • Christopher Blomqvist
  • Kerstin Enflo
  • Andreas Jakobsson
  • Kalle Åström

Summary, in English

In this paper, we present a system for extracting tabular information from loosely structured handwritten documents. The system consists of three parts, (i) a u-net like CNN-based method for text detection and segmentation, (ii) a new attention-based method for simultaneous text recognition and classification of word-parts, and (iii) a method for matching the word parts into a tabular structure for each entry. A key contribution is the observation that the new attention-based recognition and classification module makes it possible for improved spatial analysis of the tabular information. The method is evaluated on a unique historical document: The Swedish Wealth Tax of 1571, consisting of 11,453 pages of hand-written tax records. The evaluation shows that the system provides a significant improvement to the state-of-the-art to the problem of tabular extraction from loosely structured historical documents.

Avdelning/ar

  • Ekonomisk-historiska institutionen
  • Tillväxt, teknologisk förändring och ojämlikhet
  • LTH profilområde: AI och digitalisering
  • eSSENCE: The e-Science Collaboration
  • Matematisk statistik
  • Biomedical Modelling and Computation
  • Statistical Signal Processing Group
  • Stroke Imaging Research group
  • Matematik LTH
  • ELLIIT: the Linköping-Lund initiative on IT and mobile communication
  • Mathematical Imaging Group

Publiceringsår

2022-11-29

Språk

Engelska

Sidor

1564-1570

Publikation/Tidskrift/Serie

2022 26th International Conference on Pattern Recognition (ICPR)

Dokumenttyp

Konferensbidrag

Förlag

IEEE - Institute of Electrical and Electronics Engineers Inc.

Ämne

  • Computer Vision and Robotics (Autonomous Systems)
  • Economic History

Nyckelord

  • Histograms
  • Image segmentation
  • Text recognition
  • Finance
  • Writing
  • Information retrieval
  • Decoding

Conference name

26TH International Conference on Pattern Recognition, 2022

Conference date

2022-08-21 - 2022-08-25

Conference place

Montreal, Canada

Status

Published

Projekt

  • Praise the people or praise the place: How culture and specialization drive long-term regional growth

Forskningsgrupp

  • Biomedical Modelling and Computation
  • Statistical Signal Processing Group
  • Stroke Imaging Research group
  • Mathematical Imaging Group

ISBN/ISSN/Övrigt

  • ISBN: 978-1-6654-9062-7
  • ISBN: 978-1-6654-9063-4