Utilizing Natural Language Processing and Logistic Regression Model for Automated Detection and Classification of Tax Objects in Tax Audit Processes

Authors

  • Wishnu Kusumo Agung Erlangga Directorate General of Taxes
  • Bagas Dwi Suryo Wibowo Directorate General of Taxes

DOI:

https://doi.org/10.52869/t7ygg336

Keywords:

natural language processing, logistic regression, tax income, tax audit, machine learning, web application

Abstract

The Directorate General of Taxes (DGT) plays a key role in collecting state revenue in Indonesia, with tax audits being one of its core functions. However, the sheer number of taxpayers far exceeds the number of tax auditors. Despite the use of recognized accounting standards in the preparation of financial statements, the selection of account names in taxpayers' financial statements often poses a unique challenge for tax auditors. Each company may have different standards when naming its accounts, even when referring to the same type of transaction or object. This is where Natural Language Processing (NLP) can help detect and classify tax objects from the general ledger. In this research, we developed and compared machine learning models to automatically classify tax objects and fiscal corrections. This research used real data consisting of 461,776 rows of general ledger entries and was processed using quantitative methods. By using real data, the developed model has an advantage over models trained on artificial data. We compared results from Logistic Regression, K-Nearest Neighbors, and Naïve-Bayes algorithms and found that the first-mentioned algorithm suits the best metrics. The Logistic Regression model achieved a precision level of 99% in detecting both types of tax objects and fiscal corrections from financial statements. The findings of this research are expected to assist tax authorities in detecting the presence or absence of tax objects and fiscal corrections in financial statements, thereby enabling various functions within DGT to operate more efficiently and effectively.

Downloads

Download data is not yet available.

Downloads

Published

30-04-2026

How to Cite

Utilizing Natural Language Processing and Logistic Regression Model for Automated Detection and Classification of Tax Objects in Tax Audit Processes. (2026). Scientax: Jurnal Kajian Ilmiah Perpajakan Indonesia, 7(2), 188-207. https://doi.org/10.52869/t7ygg336