Penerapan Model Natural Language Processing (LDA-BOW & Word2Vec) & Diagram Sankey dalam Analisis Rantai Pasokan pada Bidang Perpajakan

Authors

  • Ryan Agatha Nanda Widiiswa Direktorat Jenderal Pajak
  • Sigit Hariyanto Direktorat Jenderal Pajak
  • Muhammad Rifqi Aziz Direktorat Jenderal Pajak
  • Ajar Parama Adhi Direktorat Jenderal Pajak

DOI:

https://doi.org/10.52869/ad83d368

Keywords:

natural language processing, supply chain analysis, machine learning, Sankey diagram, tax analysis

Abstract

The development of digital innovation and big data has created a growing need for the application of machine learning across various fields. This study seeks to integrate existing tax analysis practices with machine learning techniques. It aims to improve the efficiency and accuracy of supply chain analysis in the tax domain by applying Natural Language Processing (NLP) models alongside Sankey diagram visualizations. The NLP models employed include Latent Dirichlet Allocation Bag of Words (LDA BOW) and the Word2Vec algorithm, which serve to identify and extract transactions based on topic modeling and semantic similarity. These models are implemented within the CRISP-DM methodological framework. As a result of this application, 6.8 million PKP transactions in the pharmaceutical sector for the year 2022 were successfully classified at a rate of 73.7 %, with a 19 % improvement in accuracy following the integration of Word2Vec. In this research, Sankey diagrams are used to intuitively visualize the flow of transactions, enabling users to pinpoint critical points in the supply chain where tax-related risks or discrepancies are higher. For the supply chain analysis, the authors adopt the Supply Chain Operations Reference (SCOR) model, focusing on reliability and cost aspects that closely align with tax compliance evaluation. The findings are expected to yield a prototype application that streamlines the audit process for tax authorities and contributes to the body of text-mining literature in the field of taxation.

Downloads

Download data is not yet available.

Downloads

Published

30-04-2026

How to Cite

Penerapan Model Natural Language Processing (LDA-BOW & Word2Vec) & Diagram Sankey dalam Analisis Rantai Pasokan pada Bidang Perpajakan. (2026). Scientax: Jurnal Kajian Ilmiah Perpajakan Indonesia, 7(2), 248-270. https://doi.org/10.52869/ad83d368