Framework for Extrinsic Plagiarism Avoidance

This test data was created to evaluate the “Framework for Extrinsic Plagiarism Avoidance” which is system to detect the plagiarism in a research article and then it helps the authors to remove/replace the text which is detected to be copied from other sources. During corpus creation we selected the four documents. 2 of them have the same author and they are useful to detect self plagiarism. Other two documents are different documents and they do not have any shared attributes with first set and between them. We can call this super set and they will make the system. There are three sets of documents in user space which have been selected. 1 st is fully copied from the system space. 2 nd is partially (50% approx .) copied from system space and 3 rd one are non-plagiarized documents.

We prepared 20 different documents during test data creation. Two documents (A, D) in the test data are from distinct authors and two documents (B, C) in the corpus share the same authors. This test is added to check the self plagiarism. Document set (A1, B1, A1B1, A1B1C1, B1C1, A1D1) are fully plagiarized and documents (A2B2, A2B2C2, B2C2, A2D2, A2B2C2D2) are partially plagiarized from the first four documents. The text in remaining five documents is copied from the internet on distinct topics to make them unique.

To get this CORPUS, please email at ghani@kics.edu.pk , shamas.imran@gmail.com