Computer Science

Implementation of an Academic Research Paper Plagiarism Checker System


The problem of plagiarism in Africa generally is growing at an alarming rate, especially in Nigeria. The students of all categories are involved, ranging from primary, and secondary to tertiary institutions. To solve this problem, this research focuses on a way of developing software that detects plagiarized information on the internet and our computer folders which contain some past research works documents and a repository of our important documents.

In view of achieving this, some internet search engines were used such as,, and While detecting on computer’s records saved, some pattern matching algorisms were used.

As part of the instruments used, Programming language (Visual Basic 6.0) was used in connections with Database management utility (Microsoft Access).




While academic dishonesty is not a new phenomenon, there is no agreement about why plagiarism is so prevalent in the academic world. It is broadly acknowledged that online plagiarism is high because of the easy availability of information.

Plagiarism is a worldwide problem and actions taken to prevent it has been intensifying since about 2001 when teachers and university managers began to realize the impact of the digitalized text, the Internet, global communications, and increasingly efficient search engines(Carroll & Zetterling, 2009).

A mass survey conducted by Rutgers University reports that 38% of students are involved in online plagiarism. These alarming figures show a gradual increase in the phenomenon (Heyward, 2000).

Most graduating students nowadays go to the library and choose four or five research project topics done some years back and submit them to their supervisors for approval. When any of these topics are approved, they just go and copy everything verbatim, except preliminary pages claiming to be the pioneers of those research projects. This has a devastating effect on our educational setup. To solve this situation from getting out of hand, a modern technological step needs to be taken. Even though it cannot stop plagiarism in totality, still it reduces it to a manageable level.

Maurer, Kappe, and Zaka (2006) considered Plagiarism to be a most serious scholastic misconduct; academia everywhere is undertaking efforts to educate the students and teachers, by offering guides and tutorials to explain types of plagiarism and how to avoid it.

Today academic research supervisors and reviewers of all scholarly journals look for the following for students’ chapter’s approval and selection of a paper for its inclusion in a journal:

• Originality – what’s new about the subject?

• Relevance to the topic?

• Research methodology – are conclusions valid and objective?

• Clarity, structure, and quality of writing – does it communicate well?

• Sound, logical progression of the argument

• Currency of references

• Compliance with the editorial scope and objectives of the journal (Navin, Soni, Makhdumi, 2009).


Plagiarism causes academic institutions to lose their integrity, honesty, and values that are due to community which leads to going contrary to academic standards.

Plagiarism is a major problem at nearly every educational level from elementary schools to colleges. The following surveys serve as indicators of the stated problem:-

A survey conducted as part of the Center of Academic Integrity’s[CAI] Assessment project reveals that 40% of students admitted to engaging in plagiarism as compared to 10% reported in 1999 (McCabe, 2006).

A recent survey conducted by the Center for Intellectual Property[CIP] indicates that Internet plagiarism—where students cut and paste text taken from the Internet without attribution—has increased to 41 percent among college/university students. Students’ unpermitted collaboration at some medium and large institutions increased from 11 percent in a 1963 survey to 49 percent in 1993. In a 1999 survey, over 75 percent of the college students surveyed admitted to some type of cheating. Although some are quick to blame the culture of the Internet as the root of the problem, some scholars note that the shift towards increasing plagiarism has taken place since the 19th century(Center for Intellectual Property, 2003).

Angelil-Carter (2000) claims that there is a lack of clarity across academic institutions about what constitutes plagiarism and a discrepancy in the way plagiarism is detected and enforced.

Computer science department, UNN thinks that students steal materials from the internet or elsewhere, pay persons to do their work and most of the academic staff award them A’s or B’s to enable them to obtain 21 or 3.5 CGPA to qualify for master’s or Ph.D. admission(Postgraduate Students’ Seminar Template, 2007, p. 4).


The objective is to develop a system that should be able to:-

1. Detect a plagiarized text on the internet when searching.

2. Detect a Plagiarized text on a computer when searched on records saved using the system developed.

3. Fast track a duplicate of students’ proposed research topic.

Academic Research Paper Plagiarism Checker System


The development of any country mostly originated from academic research institutions or centers; these are places where most innovations and inventions originated. Plagiarism discourages this development. So to achieve such development, our researchers and students have to be encouraged to shun this vice(plagiarism). This laxity and laziness have to be discouraged so to attain development.

The significance of this research is to discourage unethical academic behavior and self-cheating: plagiarism and uplift the academic integrity of our academic institutions and shun what can drive us back from developing our country.



The scope of the study as the title suggests is centered on detecting plagiarized information on academic research reports either through the internet or on computer drives/folders.


In the process of carrying out research work a lot of problems posed serious challenges to the researcher and tended to limit the researcher from attaining the stated goal fully. The fund is one of the factors that hindered the researcher from buying the anti-plagiarism systems, due to how costly they are.

Time constraint was a major limiting factor as lectures were still on at the time most research was concluded.


Suspicious information/Text: Information that the researcher claims ownership of and the Supervisor doubt.

Drives: Computer HardDisk, CD Plate use, External HardDisk, Flash Disk, etc

Folder: Computer Directory that stores files

Program: Software Developed

Supervisor: Project Academic advisor.

Programming Language: Set of specific codes used to write a computer program.

Program: Set of instructions or commands written in a specific programming language to guide the computer in executing a particular task.

Implementation: The process of putting a newly developed computer program into the actual task for which it is designed.

Testing: Evaluation of designed/produced software

Research Topic Duplicate: Another similar project research topic was found.

Originality: First initiated/compiled information by the researcher.




21.1 Definitions

Plagiarism is derived from the Latin word “plagiarius” which means kidnapper. It is defined as “the passing off of another person’s work as if it were one’s own, by claiming credit for something that was done by someone else” (Wikipedia: Plagiarism 2014).

”Plagiarism is the act of taking the writings of another person and passing them off as one’s own. The fraudulence is closely related to forgery and piracy-practices generally in violation of copyright laws.” Encyclopedia Britannica.

According to the Merriam-Webster Online Dictionary, to ”plagiarize” means:

– To steal and pass off (the ideas or words of another) as one’s own.

– To use (another’s production) without crediting the source.

– To commit literary theft.

– To present as new and original an idea or product derived from an existing source.


This software developed, mainly focuses on text plagiarism. Text plagiarism can be divided into direct plagiarism and semantic plagiarism.

Direct plagiarism means that the plagiarizers search literature from the internet or literature library, copy the whole or parts of the text, and make up them together again.

Semantic plagiarism means that the plagiarizers do deep word processing after literature collection, adjust text structure, change sentence patterns, and replace some keywords (Shen, Li, Tian & Cheng, 2009).


Plagiarism is not always intentional or stealing some things from someone else; it can be unintentional or accidental and may comprise of self stealing. The broader categories of plagiarism include:

• Accidental: due to lack of plagiarism knowledge, and understanding of citation or referencing style being practiced at an institute

• Unintentional: the vastness of available information influences thoughts and the same ideas may come out via spoken or written expressions as one’s own

• Intentional: a deliberate act of copying complete or part of someone else’s work without giving proper credit to the original creator

• Self-plagiarism: using self-published work in some other form without referring to the original one [Beasley, 2006].


Arwin and Tahaghoghi(2006) have proposed several approaches to detect plagiarism in text and program source code;

• Text Plagiarism Checker

Text plagiarism involves copying parts of manuscripts, papers, and documents. Hoad and Zobel (2003) explored the ranking and finger printing approaches for detecting plagiarism of text. These approaches have a common preprocessing stage that includes case folding, stemming (removing prefix/suffix from words), stopping (removing common words), and term parsing (removing whitespace, punctuation, and control characters from the document). The ranking approach consists of two stages to find documents similar to a query. In the first stage, documents are indexed. In the second stage, terms in the query document are matched against the indexed terms of each collection document, and a similarity score is calculated. Documents are ranked by decreasing similarity scores for presentation to the user. This approach relies on the use of an effective similarity function to determine the similarity score for each document.

• Source Code Plagiarism Checker

The nature of the program source code makes it difficult to apply simple text-based detection techniques. Copied code is typically altered to avoid the plagiarism checker.

Whale (1986) listed thirteen techniques that students may use to disguise the origin of copied code; these are “changing comments, changing formatting, changing identifiers, changing the order of operands in expressions, changing data types, replacing expressions by equivalents, adding redundant statements, changing the order of time-independent statements, changing the structure of iteration statements, changing the structure of selection statements, replacing procedure calls by the procedure body, introducing non- structured statements, combining original and copied program fragments”. We consider there to be one additional item: the translation of source code from one language to another or inter-lingual plagiarism. For example, source code written in C may be copied across to an implementation in Java.


The consequences of plagiarism can be personal, professional, ethical, and legal. With plagiarism checker software so readily available and in use, plagiarists are being caught at an alarming rate. Once accused of plagiarism, a person will most likely always be regarded with suspicion. Ignorance is not an excuse. Plagiarists include academics, professionals, students, journalists, authors, and others.

• Destroyed Student Reputation

Plagiarism allegations can cause a student to be suspended or expelled. Their academic record can reflect the ethics offense, possibly causing the student to be barred from entering college from high school or another college. Schools, colleges, and universities take plagiarism very seriously. Most educational institutions have academic integrity committees that police students. Many schools suspend students for their first violation. Students are usually expelled for further offenses.

• Destroy Professional Reputation

A professional business person, politician, or public figure may find that the damage from plagiarism follows them for their entire career.

Not only will they likely be fired or asked to step down from their present position, but they will surely find it difficult to obtain another respectable job. Depending on the offense and the plagiarist’s public stature, his or her name may become ruined, making any kind of meaningful career impossible.

• Destroy Academic Reputation

The consequences of plagiarism have been widely reported in the world of academia. Once scarred with plagiarism allegations, an academic’s career can be ruined. Publishing is an integral part of a prestigious academic career. To lose the ability to publish most likely means the end of an academic position and a destroyed reputation.

• Legal Repercussions

The legal repercussions of plagiarism can be quite serious. Copyright laws are absolute. One cannot use another person’s material without citation and reference. An author has the right to sue a plagiarist. Some plagiarism may also be deemed a criminal offense, possibly leading to a prison sentence. Those who write for a living, such as journalists or authors, are particularly susceptible to plagiarism issues. Those who write frequently must be ever-vigilant not to err. Writers are well-aware of copyright laws and ways to avoid plagiarism. As a professional writer, plagiarizing is a serious ethical and perhaps legal issue.

• Monetary Repercussions

Many recent news reports and articles have exposed plagiarism by journalists, authors, public figures, and researchers. In the case where an author sues a plagiarist, the author may be granted monetary restitution. In the case where a journalist works for a magazine, newspaper, or another publisher, or even if a student is found plagiarizing in school, the offending plagiarist could have to pay monetary penalties.

• Plagiarized Research

Plagiarized research is an especially egregious form of plagiarism. If the research is medical, the consequences of plagiarism could mean the loss of people’s lives. This kind of plagiarism is particularly heinous.


Students plagiarise for a variety of reasons and it is important to consider these before reviewing detection and prevention so they can be addressed. It is also worth remembering that a combination of reasons may affect a student’s decision to plagiarise. In this instance, no distinction has been made between the plagiarism of external sources and plagiarism of their peers’ work (often referred to as collusion). A Report on the “Electronic Plagiarism Checker Project” submitted by the Joint Information System Committee (JISC), University of Luton, mentioned nine reasons why students might plagiarise(Gill, 2001).

  • Bad time management skills

Perhaps the most common reason students plagiarise is bad time management skills. Having waited until the last minute to write an assignment they get panicked and try to find the quickest solution.

  • Unable to cope with the workload

This is similar to bad time management, but this problem lies with the student’s timetable and assignments from multiple modules clashing.

  • The tutor doesn’t care why should I?

If the student senses that the instructor is not interested in the subject or the student’s learning then the student is less inclined to care.

  • External pressure to succeed

In the US, statistics have shown that one of the main reasons students resort to plagiarism is the need to keep up a grade average. There may be external pressures such as parental and cultural expectations that make students feel they have to plagiarise to achieve the target grade, either 21 or 3.5 CGPA to qualify them for further studies.

  • Lack of understanding

The most common cause of plagiarism is a lack of understanding of how to cite material from other sources.

  • I can’t do this!

If students are given an assignment and they feel it is completely beyond their ability, they may feel they have no option but to copy the answers. However, this may have to do more with a lack of clarity in the assignment specifications than a student’s ability.

  • I want to see if I can get away with it

Students may be motivated to see if they can get away with assignments given to them through plagiarism. It is likely that, whatever preventive methods are put into place, this category of students will always attempt to plagiarise.

  • I don’t need to learn this; I only need to pass it

If a student is not motivated to take part in the educational process or does not appreciate the need to acquire the knowledge to continue their education, they may be inclined to take the quickest route to success, hence tempts to plagiarise.

  • But you said work together!

Most people in the project identified collusion as a far bigger problem than plagiarism from printed material or the web. In this instance, the term collusion has been used to describe a situation whereby students have been asked to work together on an assignment and have presented the same text.


Angelil-Carter S. (2000). Stolen language? Plagiarism in writing, Pearson Education Limited, UK.

Buruiana F. C., Scoica A., Rebedea T., Rughinis R. (2013). Automatic plagiarism checker. 19th International Conference on Control Systems and Computer Science, University Politehnica of Bucharest, Romania.

Carroll J. & Zetterling C.(2009). Guiding students away from plagiarism, First edition, KTH Learning Lab and the authors.

Center for Intellectual Property. (2003). Academic integrity and plagiarism in the classroom: An overview. Retrieved September 9, 2013, from

Encyclopedia Britannica, plagiarism(last access February 7, 2011)

Gill, C.(2001). Electronic plagiarism checker project, (Report No. JCIEL(01)27), Retrieved May 17, 2013, from _documents/plagiarism_final.pdf‎

Heyward, E.(2000). Plagiarism and anti-plagiarism network, NJ: Dept. of English, Rutgers University. Retrieved October 30, 2013, from plagiarism598.html

James, R., McInnis, C. & Devlin, M.(2002). Advice on plagiarism checker software, Retrieved September 9, 2013, from http://www.cshe /assessing learning /docs/ PlagSoftware.pdf‎

Lucas I. & Niall H.(2012). Plagiarism checker systems and international students: detecting plagiarism, copying, or learning? Retrieved
March 4, 2014, from http://www.sdaw.Info/educational/Plagiarism checker %20systems%20and%2 0international%20students.pdf.

McCabe, D.(2006). Academic integrity’s assessment project research survey, Retrieved May 17, 2013, from psychology/dave.carlston/…/business2.pdf‎

Maurer, H. Kappe, F. & Zaka, B. (2006). Plagiarism – A survey, Journal of Universal Computer Science, 12, 1050-1084.

Merriam-Webster’s collegiate dictionary (10th ed.)(2003). Springfield, MA: Merriam-Webster.