Автоматична класифікація текстів

Дубовик, Андрій; Волинець, Євгеній

Автоматична класифікація текстів

dc.contributor.author	Дубовик, Андрій	uk_UA
dc.contributor.author	Волинець, Євгеній	uk_UA
dc.date.accessioned	2026-02-03T11:22:48Z
dc.date.available	2026-02-03T11:22:48Z
dc.date.issued	2025
dc.description	This study explores modern methodologies in the field of automatic text classification, a critical task in natural language processing (NLP) that enables the categorization of unstructured textual data into predefined groups without manual intervention. The rapid growth of digital text across domains such as business, media, science, and social networks has created a pressing need for scalable and accurate classification systems. The research provides an analytical overview of three primary approaches: rule-based systems, machine learning methods, and hybrid models. Particular attention is paid to evaluating the strengths and limitations of several popular machine learning algorithms, including Naive Bayes, Support Vector Machines (SVM), and Recurrent Neural Networks (RNN). While advanced techniques such as BERT and Large Language Models (LLMs) demonstrate high performance, they are not considered optimal for lightweight, user-trainable applications due to their high computational costs. To support practical implementation, the study proposes a system architecture based on the Python programming language and a suite of supporting libraries (e.g., TensorFlow, scikit-learn, NLTK, NumPy, Pandas, Matplotlib, and Seaborn). The AG News Classification Dataset is recommended as the initial training corpus, providing a robust foundation for multi-class categorization tasks. The final system design emphasizes modularity and user configurability. It allows end users to preprocess their own text data, train classification models on domain-specific content, and utilize combinations of models to improve performance. The research recommends a model ensemble consisting of Naive Bayes, SVM, and RNN due to their balance between effectiveness and computational efficiency. This study not only highlights the technical viability of automated text classification systems but also presents a practical, extensible framework suitable for real-world applications, especially for underrepresented languages such as Ukrainian. The resulting system aims to bridge the gap between academic research and deployable technology, offering a customizable platform for tasks ranging from document organization and content filtering to sentiment analysis and market research.	en_US
dc.description.abstract	У цьому дослідженні здійснено аналіз сучасних підходів до класифікації текстової інформації. Особливу увагу приділено автоматичній класифікації текстів, що передбачає їхній розподіл за визначеними категоріями без використання ручного аналізу. Розглянуто й порівняно ефективність різних методів класифікації з акцентом на гібридні системи, які здатні поєднувати переваги окремих підходів і забезпечувати підвищену точність та продуктивність моделей. Також обґрунтовано вибір інструментальних засобів для подальшої програмної реалізації системи автоматизованої класифікації текстів за категоріями. Для навчання моделей запропоновано використовувати збірку AG News Classification Dataset з платформи kaggle.com. Доцільним вважається обмеження класифікаційного процесу комбінацією трьох моделей — Naive Bayes, Support Vector Machine (SVM) та Recurrent Neural Networks (RNN), які вирізняються невисокими вимогами до обчислювальних ресурсів і часу на тренування.	uk_UA
dc.identifier.citation	Дубовик А. В. Автоматична класифікація текстів / Дубовик А. В., Волинець Є. А. // Наукові записки НаУКМА. Комп'ютерні науки. - 2025. - Т. 8. - С. 102-107. - https://doi.org/10.18523/2617-3808.2025.8.102-107	uk_UA
dc.identifier.issn	2617-3808
dc.identifier.issn	2617-7323
dc.identifier.uri	https://doi.org/10.18523/2617-3808.2025.8.102-107
dc.identifier.uri	https://ekmair.ukma.edu.ua/handle/123456789/38243
dc.language.iso	uk	uk_UA
dc.relation.source	Наукові записки НаУКМА. Комп'ютерні науки	uk_UA
dc.status	first published	uk_UA
dc.subject	класифікація текстів	uk_UA
dc.subject	машинне навчання	uk_UA
dc.subject	оброблення української мови	uk_UA
dc.subject	Naive Bayes	en_US
dc.subject	SVM	en_US
dc.subject	RNN	en_US
dc.subject	попереднє оброблення тексту	uk_UA
dc.subject	стаття	uk_UA
dc.subject	text classification	en_US
dc.subject	machine learning	en_US
dc.subject	Ukrainian language processing	en_US
dc.subject	Naive Bayes	en_US
dc.subject	SVM	en_US
dc.subject	RNN	en_US
dc.subject	text preprocessing	en_US
dc.title	Автоматична класифікація текстів	uk_UA
dc.title.alternative	Automatic text classification	en_US
dc.type	Article	uk_UA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Dubovyk_Volynets_Avtomatychna_klasyfikatsiia_tekstiv.pdf
Size:: 477.17 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Том 8
Кафедра мережних технологій
Факультет інформатики