DATA SCIENCE AND CRIMINAL JUSTICE

Prepared by Ramazan Zeyrek, Sezer Selvi, Melike Funda Kalender, Şevval Nesli, Arzu İrem Atıcı, Nurefşan Candemir, Ebrar Babayiğit and Ece Çimciler.

Ed. By Dr. Tuba Kelep, Dr. Rahime Erbaş and Begüm Tokgöz

Introduction: What is Data Science?

The realization of using data as a tool was a major turning point in the history of humanity. Accordingly, the emerging developments in data science affect everyone, rich and poor, old, and young. From daily life to justice systems, the world is changing for everyone. How will our life and criminal justice be different in the near future?

Throughout history, crimes have threatened the social order and been a cause of unrest. Therefore, every possibility that can detect and prevent crime is worth to be taken into consideration and using data science seems to be a powerful tool to do it.[i] Data science has been described as the “set of fundamental principles that support and guide the principled extraction of information and knowledge from data.” [ii]

Crime statistics have been taken into consideration for years. Some states and police departments have been gathering and reporting statistics on an extensive variety of crimes. While data analytics are changing everything from political campaigns to companies, the data about crime can be used for significantly more than just generating statistics.

As an effective instrument that can be used by criminal investigators, data mining saves time and money. Because computers with algorithms can process thousands of directions in seconds and are less susceptible to errors than human investigators, particularly people who work long hours.[iii] But still, we cannot replace all the agents and criminalists with a machine. Computers are tools for collecting and analyzing the data, but criminalists and investigators must routinely test the process and confirm the results. [iv]

Criminal justice information is creating a distinction in a very range of areas. Such as response planning, crime prevention, criminal identification, and so on. When we look at the ongoing projects about using data science to prevent crime “in terms of applying classification techniques in crime data mining, many implementations combined more than one specific type of classification technique.”[v]

A Brief History of Data Science in Legal Area

Our world runs with data science now. If we could be more specific about it, algorithms and big data are running our world. We use it every part of our lives. Our shopping preferences, payment records, whom to love, whom to hate… It is all in the records and collected by tech companies. It is a part of ‘solutionism’, whereby tech companies offer technical solutions to all social problems, including crime. [vi]But this was not the case through all the rest of the history of both world and law. Data mining and generally data science can be considered as an innovation in the legal area. 

For many years, before the digital era’s beginning police departments kept crime records just on paper. In 1990, there was a rash of violent crimes occurring on the New York subway. A transit cop put up a map of the subway the crimes were being committed as well as what time of day. After processing and analyzing the information (data), he was able to communicate with other policemen where and when they have more officers patrolling.[vii] Crime rates were lower at that time but as time pass and the crime rate increase the data that police departments have, simply cannot be processed by the human mind. In today, computers allow officials to save and cluster unlimited data. These data savings comprise extensive information about criminals and committed crimes. Using the big database, police, and detectives are now able to solve a crime more easily and rapidly.[viii] Increased crime rates and the widened variety of crime areas also led both data scientists and criminologists to find solutions for preventing criminals before they act. Criminologists tried to identify criminals before they even acted upon them throughout history. Now with the developed technology, criminologists benefit from data science in subjects of criminal identification, crime prediction, and risk evaluation. 

In Germany, data science and criminal justice collaboration also took place in the private sector.[ix] But this isn’t just happening in Germany. In the last ten years, data-driven predictive policing companies and data security programs such as (PredPol and Northpoint) have occurred. It is implied that police departments utilize these programs and platforms, especially in the U.S.[x]

Apart from policemen and criminologists, data science carries weight in national and international courts too. Today it is acknowledged that collected data may play a key role, principally in cases about cybercrime. In the case of Slovenia v. Benedik, the European Court of Human Rights emphasizes that the Court can order access to certain data, for defined and lawful purposes.[xi] Just like ECHR; national or other international courts can order access to certain data which can be used as evidence.

In conclusion, data science has obtained an important place in the field of law in a short time, thanks to the conveniences and solutions it provides. As it seems, in the age of artificial intelligence that awaits us in the future, data science will become even more substantial in criminal justice and generally in legal history.

Why Data Matters in Criminal Justice

Data is the first information needed to achieve results. They help support corporate decision making and strategies. While good data provides indisputable evidence, it is also possible that the opposite may happen. Data reveals the causes of problems more effectively. They show what is happening in different systems, departments. Saving data allows us to see how well solutions are performing and accordingly, we understand the need for solutions to change in the long term. Data increases efficiency. If data collection is used effectively and analysis is also performed effectively, deficiencies are completed faster.[xii]

“The use of data science has become widespread in the field of ‘criminal justice’, as in many other fields.” This is due to an increased focus on data analysis with jurisdictions. ‘’In fact, as of 2016, 120 jurisdictions covering more than 91 million people had signed up to the Data-Driven Justice Initiative, an ambitious U.S. Department of Justice program designed to increase data use in various areas of criminal justice’’.[xiii]

Data analysts use criminal justice data to find patterns that can help improve the quality and efficiency of the justice system. [xiv]

Criminal Justice Data is making a difference in several areas like Response Planning, Crime Prevention, Criminal Identification, Predictive Policing, Improving Community Relations, Initiative Assessment. Let’s take a closer look at the benefits of Data Science in these areas:

1) Crime Prevention: If crime data that includes rate of unemployment, state crime rates, incidences of malicious mischief, etc. is conveyed, enforcement organizations will uncover each giant and delicate correlation of criminal behaviors. Once these information points are measured and geotagged, enforcement authorities will use the information to predict when and where certain kinds of crime may presumably occur. More comfortable with uncovering the correlation of criminal behaviors. In real-world usage, such information analytics have been established quite effectively. For example, during a Manchester, New Hampshire, pilot program, local police used advanced data analysis to institute preventative measures that resulted in a 12% reduction in robberies, a 21% reduction in burglaries, and a 32% reduction in thefts from motor vehicles.2

2) Criminal Identification: Thanks to information analytics, enforcement agencies from across the state will input files from crime scenes into databases designed to search out connections between cases. This will facilitate enforcement and produce profiles of specific criminals and slender suspect lists.

3) Risk Assessment: In her 2013 TED Talk, former New Jersey lawyer general, Anne Milgram, mentioned how she has helped develop a data-driven system for deciding whether or not a condemned criminal is probably going or unlikely to cause a threat to public safety if free from jail or given probation. This system will facilitate judges and parole boards create higher risk assessments to enhance public safety.[xv]

4) Improving Community Relations:  National crime information is a wonderful tool in serving to enhance relations with the community. The general public deserves to be told how well the police area unit protects the community and provides security. Sharing crime statistics with the general public will increase the trust in police and create smart operating relationships.

5) Initiative Assessment: Law enforcement initiatives area unit is created to decrease crime. Crime statistics area unit is vital in deciding whether or not these initiatives area unit is operating, or if changes in the area unit are required. The information will show if crime goes up or down within the areas targeted. To give an example, in 1990, there was a lot of crime in the New York subway. “A Transit policeman produced a map of the subway system, with pins showing where the crimes were committed in the subway and at what time of day.” Later, the number of patrols was increased according to the data obtained from this map. As a result, there has been a significant reduction in crime in the subway. Data science was used even when technology and digitalization were not the same as they are today. The results and effectiveness of data science which is developing today are much more.

6) Predictive Policing: Crime statistics are often a tool in serving to criminal justice professionals to anticipate the magnified risk of crime. This may be followed up by enforcement intervention to forestall the anticipated crimes from occurring. The predictive policing information will facilitate targeting a selected area and permit police resources to be used effectively. However, the predictive price of crime statistics is debatable and still desires refinement.  There are many pilot projects that have some outcomes on crime statistics to predict future crime, however, the results are inconclusive. The goal is that with additional work, crime statistics may become an effective tool to reduce future crime rates.[xvi]

       It is clearly understood that the features of data science in creating models and explaining these models, classifying, analyzing, providing easier access to information, increasing the speed of the process have very important benefits in realizing Criminal Justice and will develop further in the future. In order to use Data Science more efficiently and to make important developments, studies should be carried out by keeping up with the developing technology.

How Does Data Science Help Law Enforcement?

Law enforcement is “the activity of making certain that the laws of an area are obeyed” as a dictionary meaning.[xvii] It “describes the agencies and employees responsible for enforcing laws, maintaining public order, and managing public safety. The primary duties of law enforcement include the investigation, apprehension, and detention of individuals suspected of criminal offenses.”[xviii] It is undoubted that the prevention of committing a crime is also included in the responsibility of ensuring public safety. As a matter of fact, the process of committing the crime, also called “criminal path (iter criminis)”, consists of the stages of the idea (thinking), preparatory actions, starting the execution of the crime, attempting to commit a crime and completing it. As can be understood from these definitions, law enforcement officers may be involved in almost every step of the criminal path while they perform their main task of protecting the life and property of individuals by benefiting from various data they collect.

In periods when digital tools were not common, while trying to prevent and solve crimes, data were collected personally from different public institutions or the mapping of places where crime was committed frequently was done manually; today, this is realized thanks to smart machines in which artificial intelligence (AI) is located in the center. Some examples of these AI supported systems are crime mapping software, gps, national databases developing data from different fields such as social media channels or search histories of internet browsers, body cams and facial recognition algorithms.[xix] 

Accurate estimates of the development of criminality are quite difficult to make even today. However, analytical tools and methods provide enormously successful results. In this part of the article, we are going to examine these methods:

1) Environmental analysis: “Environmental analysis constitutes a systematic study to predict the flow of events that may occur in an acceptable manner in the targeted forecast horizon and events that cause change in the relevant environment.”[xx] Mentioned occurs in economic conditions, demographic investments, international events, social behavior, etc.

2) Delphi Technique: This technique is a part of environmental analysis. It is a reconciliation technique that collects different experts’ views systematically. Thanks to this technique, problems are solved by experts who have different views and have never encountered each other.

3) Scenario Building: Scenario building is a fiction of events that are likely to happen. They analyze scenarios according to possible consequences. These analyzes are not mathematical, they are mostly qualitative.  

These systems can eliminate human errors, save time and help the investigation to be carried out quickly; however, it should not be forgotten that data science machines have a data-dependent nature, so there is the possibility for them to make mistakes depending on the input data. To give an example of a concrete case, the Detroit Police Department arrested a man based on an incorrect facial recognition match in January 2020. The American Civil Liberties Union (ACLU) filed a lawsuit on behalf of him, claiming that the facial-recognition algorithm was incorrectly matching the thief’s photo obtained from a security camera with Williams’ driver license photo.[xxi] The reason underlying the false matching was the high error rates of these systems in black people, and this situation has been proven in a study conducted through three companies manufacturing above-mentioned technologies (IBM, Microsoft and Face++) as follows: All companies perform better on lighter subjects than on darker subjects with an average of 14.4% difference in error rates.[xxii] Hence; IBM, Microsoft and other similar companies have announced that they will stop selling facial recognition tools for law enforcement due to concerns that the algorithms are used to promote “racial profiling, violations of basic human rights and freedoms, or any purpose which is not consistent with the principles of transparency”[xxiii]. According to Mr Büchi from the University of Zürich: “These systems could deter people from exercising their rights and could lead them to modify their behaviors, [t]his is a form of anticipatory obedience; being aware of the possibility of getting (unjustly) caught by these algorithms, people may tend to increase conformity with perceived societal norms. Self-expression and alternative lifestyles could be suppressed.”[xxiv]

Does Data Science Violates Privacy?

While there is no doubt about data science’s benefits when it comes to predictive policing or fighting counterterrorism, there are some concerns about the way data are used, especially when it is used for legal purposes. One of the biggest concerns is human rights violation, especially privacy invasion. Considering that the collection of data is mainstream and has global effects today, the positive sides and problems (like privacy invasion) have global consequences as well.

Privacy means having the ability to seclude yourself with the aim of limiting others’ influences on you. Information privacy is having the ability to seclude information about yourself for the same purpose.[xxv] Having privacy is seen as an essential for human rights, such as the freedom of expressing, freedom of association and the freedom of choice.[xxvi] Data science has come to a point where we are not able to control how our data is getting collected, processed, shared and used what for. This means privacy of personal data, or any other data, can get violated.

Data science projects always have some legal risk because it is almost impossible to know which data are contained in the project or the reasons they are going to be used. This situation may contribute to a variety of illegal activities such as copyright and other intellectual property infringements, breaches of confidentiality, and privacy invasions.[xxvii] In addition to these illegal activities, the data can be used for discrimination. For example, colleges and universities could use information collected as a way to screen students unfairly. They can favor students who participate in extracurricular academic activities over the students who like socializing more. This also means data science and the new technologies based on it cause loss of privacy.

In the sense of criminal justice, data is used to enable predictive information that will help police forces to prevent crimes before it even happens or to track criminals and stop certain crimes before they happen. Although this is for a good cause, the way the data gets collected is questionable in the context of privacy.[xxviii] One of the main questions is “How can the privacy of individuals/groups be protected?” To answer the questions asked, Council of Europe publishedGuidelines on the protection of individuals with regard to the processing of personal data in a world of Big Data” on 23 January 2017.[xxix] According to the guidelines, “the common guiding ethical values can be found in international charters of human rights and fundamental freedoms, such as the European Convention on Human Rights.”

The guidelines give directives about how preventive policies should be, as well.  According to it, people who have decision-making power should adopt preventive policies “concerning the risks of the use of Big Data and its impact on individuals and society, to ensure the protection of persons with regard to the processing of personal data.”  The guidelines also point out preventive policies “shall consider the legal, social and ethical impact of the use of Big Data, including with regard to the right to equal treatment and to non-discrimination.” Based on these directives, it is clear that public authorities are taking actions to protect people’s privacy, prevent any non-equal treatments and discriminations and control the risks that impact social life.

To conclude, data science has many effective conclusions but, no matter how many positives it has, there is still always the question of privacy.[xxx] Because of this situation, there must be legal responsibilities about data collection and process so people’s privacy can be protected. To create this legal responsibilities, public authorities are taking actions and some rigidly defined data privacy controls are starting to appear in the form of new regulations around the World, such as General Data Protection Regulation (GDPR), Russian Federal Law on Personal Data, German Bundesdatenschutzgesetz (BDSG).[xxxi]


[i] Shyam Varan Nath, Crime Pattern Detection Using Data Mining (2006) IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops, 4

[ii] Provost, F., & Fawcett, T. Data Science and its Relationship to Big Data and Data-Driven Decision Making (2013).. Big Data, 1(1), 52

[iii] Chen, H., Chung, W., Xu, J. J., Wang, G., Qin, Y., & Chau, M. Crime data mining: a general framework and some examples (2004), Computer, 37(4), 50

[iv] Matthew L. Williams, Pete Burnap, Luke Sloan, Crime Sensing With Big Data: The Affordances and Limitations of Using Open-source Communications to Estimate Crime Patterns (2016). The British Journal of Criminology, Volume 57, 320

[v] Hassani, H., Huang, X., Silva, E. S., & Ghodsi, M. A review of data mining applications in crime Statistical Analysis and Data Mining (2016). The ASA Data Science Journal, 9(3), 144

[vi] Morozov E (2013) To Save Everything, Click Here: Technology, Solutionism, and the Urge to Fix Problems that Don’t Exist. London: Allen Lane. 

[vii] Anonym, (2021) <https://www.datasciencedegreeprograms.net/faq/how-does-law-enforcement-use-data-science/ > Date of Access 25 June

[viii] Shyam Varan Nath, (n 1) 4 

[ix] Kai Seidensticker, Felix Bode, Florian Stoffel, ‘Predictive Policing in Germany’ (2018) <http://kops.uni-konstanz.de/bitstream/handle/123456789/43114/Seidensticker_2-14sbvox1ik0z06.pdf?sequence=5&isAllowed=y> Date of Access 25 June 2021

[x] Mark Puente, ‘LAPD pioneered predicting crime with data. Many police don’t think it works’ , Los Angeles Times. <https://www.latimes.com/local/lanow/la-me-lapd-precision-policing-data-20190703-story.html> Date of Access 27 June 2021.

[xi] Slovenia v Benedik App no 62357/14 (ECHR, 10 September 2014)

[xii] 12 Reasons Why Data Is Important, CQL, https://www.c-q-l.org/resources/guides/12-reasons-why-data-is-important/ Date of Access 24 June 2021

[xiii] Why Data Matters in Criminal Justice, Walden University, https://www.waldenu.edu/programs/criminal-justice/resource/why-data-matters-in-criminal-justice Date of Access 24 June 2021

[xiv] ibid

[xv] ibid

[xvi] https://www.waldenu.edu/online-bachelors-programs/bs-in-criminal-justice/resource/why-national-crime-statistics-are-important

[xvii] “Meaning of Law Enforcement in English” <https://dictionary.cambridge.org/dictionary/english/law-enforcement> accessed June 26, 2021

[xviii] “Law Enforcement” (Bureau of Justice Statistics February 18, 2021) <https://bjs.ojp.gov/topics/law-enforcement> accessed June 26, 2021

[xix] Campise K, “Data Science for Law Enforcement” (Discover Data Science December 21, 2019) <https://www.discoverdatascience.org/industries/law-enforcement/> accessed June 26, 2021

[xx] Gül SK and Polat A, “KAMU GÜVENLİK POLİTİKALARININ OLUŞTURULMASINDA YENİ BİR YAKLAŞIM: SUÇ TAHMİNİ (A New Approach in Public Security Policy Formation: Crime Prediction)” (2009) 81 Türk İdare Dergisi 131

[xxi] Hill K, “Wrongfully Accused by an Algorithm” (June 24, 2020) <https://www.nytimes.com/2020/06/24/technology/facial-recognition-arrest.html> accessed June 26, 2021

[xxii] Buolamwini J and Gebru T, “The Gender Shades Project Evaluates the Accuracy of AI Powered Gender Classification Products” (Gender Shades) <http://gendershades.org/overview.html> accessed June 26, 2021

[xxiii] Padilla CA, “A Precision Regulation Approach to Controlling Facial Recognition Technology Exports” <https://www.ibm.com/blogs/policy/facial-recognition-export-controls/> accessed June 26, 2021

[xxiv] Kayser-Brill N, “Swiss Police Automated Crime Predictions but Has Little to Show for It” (Algorithm Watch July 22, 2020) <https://algorithmwatch.org/en/swiss-predictive-policing/> accessed June 27, 2021

[xxv] Anonym, ‘What does privacy mean’ < https://iapp.org/about/what-is-privacy/> Date of Access 26 June 2021

[xxvi] Michael Deane, ‘AI and the Future of Privacy’ (2018) < https://towardsdatascience.com/ai-and-the-future-of-privacy-3d5f6552a7c4> , Date of Access 26 June 2021

[xxvii] Jules J. Berman, Principles and Practice of Big Data, (2nd edn. Academic Press 2018) < https://doi.org/10.1016/C2017-0-03409-2> Date of Access 26 June 2021

[xxviii] Anonym, ‘Is Big Data a good thing or an invasion of privacy?’ < https://venturi-group.com/is-big-data-a-good-thing-or-an-invasion-of-privacy/> Date of Access 26 June 2021

[xxix] Council of Europe. (2017). Guidelines on the protection of individuals with regard to the processing of personal data in a world of Big Data [T-PD(2017)01]. https://rm.coe.int/t-pd-2017-1-bigdataguidelines-en/16806f06d0

[xxx] Anonym, ‘Is Big Data a good thing or an invasion of privacy?’ < https://venturi-group.com/is-big-data-a-good-thing-or-an-invasion-of-privacy/> Date of Access 26 June 2021

[xxxi] Matthew Carrol, ‘Data Privacy & Data Science: The Next Generation of Data Experimentation’ (2016) < https://www.immuta.com/articles/data-privacy-data-science-the-next-generation-of-data-experimentation/> ,Date of Access 26 June 2021