The Impact of Similarity Functions on Divisive Analysis Clustering of Tuberculosis Disease
Yulita Molliq Rangkuti (a*), Mansur AS (b), Agus Junaidi (c), Nurul Maulida Surbakti (a), Rizky Gunawan

a) Mathematics Department,Faculty of Mathematics and Natural Science, Universitas Negeri Medan (Unimed), Medan,20221, North Sumatra, Indonesia
*yulitamolliq[at]unimed.ac.id
b) Computer Science Study Program, Faculty of Mathematics and Natural Science, Universitas Negeri Medan
(Unimed), Medan,20221, North Sumatra, Indonesia
c) Electrical Engineering Education department, Faculty of Engineering, Universitas Negeri Medan
(Unimed), Medan,20221, North Sumatra, Indonesia
b) Computer Science Study Program, Faculty of Mathematics and Natural Science, Universitas Negeri Medan
(Unimed), Medan,20221, North Sumatra, Indonesia


Abstract

Tuberculosis is a bacterial infection that usually affects the lungs but can also affect other parts of the body. It is contagious and spreads through the air when an infected person coughs, sneezes or sings. North Sumatra ranks as one of the three Indonesian provinces that experience the highest rates of both occurrence and death. Tracking Instances of Tuberculosis is crucial for managing and averting the spread of the illness. The Divisive Analysis (DIANA) algorithm is frequently utilized to categorize Tuberculosis cases. DIANA operates as a clustering algorithm that organizes items into sets based on their similarities. The study emphasizes evaluating the effectiveness of various similarity functions. The dataset comprises factors such as mortality rates, infection rates, and recovery rates sourced from the North Sumatra Provincial Health Office and the Central Statistics Agency (BPS). The findings indicated the emergence of four clusters within North Sumatra Province. Furthermore, an assessment was performed employing the Davies Bouldin Index (DBI) to assess the quality of clustering. By comparing various distance metrics (Bray Curtis distance, Chebyshev distance, and Canberra distance), the lowest DBI score was reached with Chebyshev distance, yielding a value of 0.5121. This value reflects a satisfactory level of cluster quality. As a result, the study aids in visualizing the distribution of Tuberculosis cases in North Sumatra Province and provides a basis for data-driven decision-making in addressing the disease

Keywords: Tuberculosis- DIANA- Davies Bouldin Index- Similarity Function

Topic: Mathematics and Computational System

ICIESC 2025 Conference | Conference Management System