Authorship Attribution Using Principal Component Analysis and Nearest Neighbor Rule for Neural Networks

Mehmet Can


Feature extraction is a common problem in statistical pattern recognition. It refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. Principal component analysis is one of these processes. In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis. To make a comparison, the original data is also processed. Authors of texts identified with higher success by the competitive neural networks, which use principal components. The process repeated on another group of authors, and similar results are obtained.

Full Text:




  • There are currently no refbacks.

Copyright (c) 2015 SouthEast Europe Journal of Soft Computing

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License