Abstract:
This paper presents a new method for the ordering process in Chinese grocery stores using Tesseract OCR for bilingual Chinese-Serbian translation. The system is implemented as a Telegram chatbot, allowing users to capture images of orders, which are then automatically processed into CSV delivery lists. To ensure high performance on low-power devices, we used the Tesseract LSTM engine. To improve accuracy, all images are pre-processed before being passed to the OCR engine, and regular expressions (regex) are used to extract and normalize units and quantities from the recognized text. A domain-specific dictionary containing approximately 3,000 translations serves as a robust error-handling mechanism to mitigate OCR misclassifications. This hybrid approach, combining neural recognition with expert linguistic rules, ensures high reliability even when processing low-quality images from mobile chat applications. The final output provides structured data, including company names, item lists, and units, in order to reduce manual entering CSV data for drivers and logistics.
CITATION:
IEEE format
M. Ercegovac, Y. Zheng, A. Njeguš, “A Tesseract-Based OCR System for Bilingual Chinese-Serbian Grocery Order Processing,” in Sinteza 2026 - International Scientific Conference on Information Technology, Computer Science, and Data Science, Belgrade, Singidunum University, Serbia, 2026, pp. 220-225. doi:10.15308/Sinteza-2026-220-225
APA format
Ercegovac, M., Zheng, Y., Njeguš, A. (2026). A Tesseract-Based OCR System for Bilingual Chinese-Serbian Grocery Order Processing. Paper presented at Sinteza 2026 - International Scientific Conference on Information Technology, Computer Science, and Data Science. doi:10.15308/Sinteza-2026-220-225