Local Matching Networks for Engineering Diagram Search
Zhuyun Dai Language Technologies Institute School of Computer Science Carnegie Mellon University zhuyund@cs.cmu.edu |
Zhen Fan Dept. of Computer Science and Technology Tsinghua University fanz15@mails.tsinghua.edu.cn |
Hafeezul Rahman Language Technologies Institute School of Computer Science Carnegie Mellon University hmohamma@andrew.cmu.edu |
Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University callan@cs.cmu.edu |
Finding diagrams that contain a specific part or a similar part is important in many engineering tasks. In this search task, the query part is expected to match only a small region in a complex image. This paper investigates several local matching networks that explicitly model local region-to-region similarities. Deep convolutional neural networks extract local features and model local matching patterns. Spatial convolution is employed to cross-match local regions at different scale levels, addressing cases where the target part appears at a different scale, position, and/or angle. A gating network automatically learns region importance, removing noise from sparse areas and visual metadata in engineering diagrams.
Experimental results show that local matching approaches are more effective for engineering diagram search than global matching approaches. Suppressing unimportant regions via the gating network enhances accuracy. Matching across different scales via spatial convolution substantially improves robustness to scale and rotation changes. A pipelined architecture efficiently searches a large collection of diagrams by using a simple local matching network to identify a small set of candidate images and a more sophisticated network with convolutional cross-scale matching to re-rank candidates.
We downloaded product assembly manuals from Ikea. We are unable to redistribute the dataset, but we list the source of each document and query diagram in the follwing files. A substantially similar dataset can be built with these descriptions.
The corpus description file lists the source of 15,450 Ikea diagrams that were used in our work. The format is "document_id \t ikea_product_name \t ikea_manual_pdf_name \t pdf_page". For example, "0_0 ABSORB LEATHER CARE CLEANER A20137945.pdf 0" means that the diagram with id "0_0" belongs to the product "ABSORB LEATHER CARE CLEANER", the original manual PDF file is "A20137945.pdf", and the diagram is the first page of the PDF file.
Queries are auto-generated by cropping from the original diagrams. The query files list the source diagram of each query, e.g. query "1547_8" comes from the diagram "1547_8". We also list the position/scal/rotation changes made to the queries.