Utilizing Natural Language Processing and Logistic Regression Model for Automated Detection and Classification of Tax Objects in Tax Audit Processes
DOI:
https://doi.org/10.52869/t7ygg336Keywords:
natural language processing, logistic regression, tax income, tax audit, machine learning, web applicationAbstract
The Directorate General of Taxes (DGT) plays a key role in collecting state revenue in Indonesia, with tax audits being one of its core functions. However, the sheer number of taxpayers far exceeds the number of tax auditors. Despite the use of recognized accounting standards in the preparation of financial statements, the selection of account names in taxpayers' financial statements often poses a unique challenge for tax auditors. Each company may have different standards when naming its accounts, even when referring to the same type of transaction or object. This is where Natural Language Processing (NLP) can help detect and classify tax objects from the general ledger. In this research, we developed and compared machine learning models to automatically classify tax objects and fiscal corrections. This research used real data consisting of 461,776 rows of general ledger entries and was processed using quantitative methods. By using real data, the developed model has an advantage over models trained on artificial data. We compared results from Logistic Regression, K-Nearest Neighbors, and Naïve-Bayes algorithms and found that the first-mentioned algorithm suits the best metrics. The Logistic Regression model achieved a precision level of 99% in detecting both types of tax objects and fiscal corrections from financial statements. The findings of this research are expected to assist tax authorities in detecting the presence or absence of tax objects and fiscal corrections in financial statements, thereby enabling various functions within DGT to operate more efficiently and effectively.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Scientax: Jurnal Kajian Ilmiah Perpajakan Indonesia

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.







