Under INT8 Quantization of Encoder Transformers

Tomić, Igor

doi:10.15308/Sinteza-2026-191-197

Početna » Sinteza 2026 - International Scientific Conference on Information Technology, Computer Science, and Data Science » Computer Science and Artificial Intelligence

Under INT8 Quantization of Encoder Transformers

DOI: https://doi.org/10.15308/Sinteza-2026-191-197

Authors:
Igor Tomić

Download full paper

Keywords:
Quantization, Transformers, Silent Attribution Drift, Logit Shift Analysis

Abstract:
Deep learning models based on transformer encoder architectures, when used in production environments, require substantial computational resources. To address these challenges, various methods of compression have been developed, including quantization. Quantized models are often evaluated only on accuracy, while preservation of explainability is implicitly assumed; whether explainability is maintained at the token-attribution level, however, has not yet been systematically assessed. In this paper, we introduce the silent attribution drift effect, a phenomenon in which the quantized model faithfully reproduces the predictions of the original model but generates fundamentally different explanations based on the importance of the input tokens. The results reveal a pronounced discrepancy between the stability of predictions and the stability of explanations. While prediction accuracy matching reaches 96-100%, the Spearman rank correlation of attributions drops, which shows a change in the distribution of token importance. We establish that the robustness of attributions depends on the architecture and complexity of the task. Four-class classification shows increased deviation compared to binary classification. We propose logit shift analysis as a diagnostic tool that mechanistically explains the sources of deviation, revealing that quantization changes the model’s sensitivity to individual tokens. Our findings indicate that evaluating compressed models based solely on accuracy is not sufficient in domains where explainability is a regulatory requirement or ethically necessary.

CITATION:

IEEE format

I. Tomić, “Under INT8 Quantization of Encoder Transformers,” in Sinteza 2026 - International Scientific Conference on Information Technology, Computer Science, and Data Science, Belgrade, Singidunum University, Serbia, 2026, pp. 191-197. doi:10.15308/Sinteza-2026-191-197

APA format

Tomić, I. (2026). Under INT8 Quantization of Encoder Transformers. Paper presented at Sinteza 2026 - International Scientific Conference on Information Technology, Computer Science, and Data Science. doi:10.15308/Sinteza-2026-191-197

BibTeX format
Download

RefWorks Tagged format
Download