Principal Component Analysis and Neural Networks for Authorship Attribution

Mehmet Can

Abstract


A common problem in statistical pattern recognition is that of feature selection or feature extraction. Feature selection refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction.

In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis performed using neural networks. Then, first of the principal components are used to distinguish authors of the texts by the use of multilayer preceptor type artificial neural networks.


Full Text:

PDF


DOI: http://dx.doi.org/10.21533/scjournal.v1i1.79

Refbacks

  • There are currently no refbacks.


Copyright (c) 2015 SouthEast Europe Journal of Soft Computing

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License