Principal Component Analysis for Authorship Attribution

Amir Jamak

Abstract


A common problem in statistical pattern recognition is that of feature selection or feature extraction. Feature selection refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. In this paper the data collected by counting words and characters in around a thousand paragraphs of each sample book underwent a principal component analysis performed using neural networks. Then first of the principal components is used to distinguished the books authored by a certain author.

Full Text:

PDF


DOI: http://dx.doi.org/10.21533/scjournal.v1i1.74

Refbacks

  • There are currently no refbacks.


Copyright (c) 2015 SouthEast Europe Journal of Soft Computing

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License