Principal Component Analysis for Authorship Attribution

Amir Jamak

doi:10.21533/scjournal.v1i1.74

Principal Component Analysis for Authorship Attribution

Amir Jamak

Abstract

A common problem in statistical pattern recognition is that of feature selection or feature extraction. Feature selection refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. In this paper the data collected by counting words and characters in around a thousand paragraphs of each sample book underwent a principal component analysis performed using neural networks. Then first of the principal components is used to distinguished the books authored by a certain author.

Full Text:

PDF

DOI: http://dx.doi.org/10.21533/scjournal.v1i1.74

Refbacks

There are currently no refbacks.

Digital Object Identifier DOI: 10.21533/scjournal

This work is licensed under a Creative Commons Attribution 4.0 International License

Username
Password
Remember me

Southeast Europe Journal of Soft Computing

Principal Component Analysis for Authorship Attribution

Abstract

Full Text:

Refbacks