Selected Data Mining Tools for Data Analysis in Distributed Environment

Mikhail Moshkov, Beata Zielosko, Evans Teiko Tetteh

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

In this paper, we deal with distributed data represented either as a finite set T of decision tables with equal sets of attributes or a finite set I of information systems with equal sets of attributes. In the former case, we discuss a way to the study decision trees common to all tables from the set T: building a decision table in which the set of decision trees coincides with the set of decision trees common to all tables from T. We show when we can build such a decision table and how to build it in a polynomial time. If we have such a table, we can apply various decision tree learning algorithms to it. We extend the considered approach to the study of test (reducts) and decision rules common to all tables from T. In the latter case, we discuss a way to study the association rules common to all information systems from the set I: building a joint information system for which the set of true association rules that are realizable for a given row ρ and have a given attribute a on the right-hand side coincides with the set of association rules that are true for all information systems from I, have the attribute a on the right-hand side, and are realizable for the row ρ. We then show how to build a joint information system in a polynomial time. When we build such an information system, we can apply various association rule learning algorithms to it.
Original languageEnglish (US)
Pages (from-to)1401
JournalEntropy
Volume24
Issue number10
DOIs
StatePublished - Oct 1 2022

Bibliographical note

KAUST Repository Item: Exported on 2022-10-11
Acknowledgements: Research funded by King Abdullah University of Science and Technology

ASJC Scopus subject areas

  • Physics and Astronomy (miscellaneous)
  • Statistical and Nonlinear Physics

Fingerprint

Dive into the research topics of 'Selected Data Mining Tools for Data Analysis in Distributed Environment'. Together they form a unique fingerprint.

Cite this