TNRC Blog Harnessing big data to uncover corruption in the forestry sector
Harnessing big data to uncover corruption in the forestry sector
This blog post captures expert insights and responses to practitioner questions raised in a TNRC Learning Series Webinar on Harnessing big data to uncover corruption in the forestry sector. The webinar, held on 22 September 2022, addressed four learning questions: (1) What role do politically exposed persons play in forestry corruption and how could these theoretically be detected?; (2) What software tools exist for civil society to collect, analyze and enrich data to detect the risk of corruption?; (3) What is the importance of regional data sets, native language skills and country-specific details such as naming conventions to this type of analysis?; and (4) How do we drive anti-corruption activity resulting from this analysis? This learning event was attended virtually by practitioners based in 54 countries. A recording of the webinar and PDF slides are forthcoming.
TRAFFIC has been exploring whether big data techniques can be used to identify corruption in the timber sector since 2021. Learning from their early research can be found in this blog.
Key Takeaways
- Big data analysis offers an efficient and cost-effective means to identify potential corruption risks.
- Such analysis can be delivered with small teams that have knowledge of web-scraping and data analysis software, as well as local language skills and regional context.
- Anti-corruption practices, such as public listings of government officials and public listings of companies involved in forest extraction, facilitate this analysis. Publicly-available pro-transparency databases are hugely valuable in enabling data-matching to identify potential conflicts of interest.
- Existing publicly-available counter-corruption data meant that it wasn’t as necessary for researchers in this process to filter down from large datasets as anticipated. This demonstrates the effectiveness and value of open data to increase transparency to identify the risk of corruption by Politically Exposed Persons (PEPs).
Reducing deforestation and ensuring legal and sustainable forest supply chains requires tackling corruption at multiple levels—starting with the point of access. PEPs (individuals entrusted with a prominent public function) may award contracts and permissions to entities in exchange for private gain, such as bribes to themselves or via corporate vehicles or contacts from which they derive beneficial ownership or other advantageous control. Big data, including publicly accessible open data, has strong potential to identify such corrupt actors. With the right methodology, this can be done affordably by conservation organizations to help financial institutions and financial investigation units freeze illicit assets and ultimately prosecute corruption in the forestry sector. New research undertaken by data analysts and financial flow experts at TRAFFIC and the Basel Institute on Governance under the Targeting Natural Resource Corruption project is testing new methods for automating data collection and analysis techniques used in other sectors. The hope is to identify potential corruption at the point of allocation of forest access rights. Early learning from experts involved in this project is distilled below for other practitioners seeking to leverage open data to advance conservation outcomes.
What role do PEPs play in forestry corruption and how could these theoretically be detected?
“Politically Exposed Person” is a term from anti-money laundering practice for individuals who carry an increased risk of money laundering due to their senior position, such as senior government ministers. The term carries no assessment of wrongdoing, rather it is a recognition of potential risk that people holding these types of roles could represent a greater risk of money laundering and corruption as PEPs can unfairly award contracts and permissions to themselves or others for personal gain. The risks of PEPs in forestry was highlighted in a 2016 INTERPOL study into corruption in the forestry sector that found that the persons most likely to be involved in corruption in the forestry sector are government officials from Forestry Agencies. Government officials from other agencies, law enforcement officers and logging company officials are also found to be extensively involved. Many of these roles could also be considered PEPs.
This research project championed the use of ‘big data’ analytical methods by collecting and applying data about government officials who could make decisions around the allocation of logging rights and those who also benefitted either directly or indirectly from logging businesses. This process was never intended to identify criminality, rather it was intended to identify individuals whose position indicated a higher assessed threat. The risk assessment and the source documents were then shared with financial institutions to allow them to pursue their own inquiries, which would require financial institutions to report any suspicions to their national Financial Intelligence Unit, which may then contribute to existing criminal cases or to the launch of new investigations.
What software tools exist for civil society to collect, analyze, and enrich data to detect the risk of corruption?
The use of big data analytics offers a way to reduce corruption by uncovering patterns of bribery and other corrupt acts in the allocation of harvest rights. By using software such as ParseHub (a visual web scraper), Videris (a web investigation platform) and i2 Analysts Notebook (a data visualisation tool), government staff associated with national forest resource allocation and individuals with either direct or indirect beneficial ownership of logging companies can be collected and cross-matched in an efficient and cost-effective manner at significant scale.
We used a four-stage approach; a web scraper to collect data from government websites, public records and other datasets; text analytics and natural language processing to automatically analyze the text the web scraper has gathered to find recurring terms and relationships; visualizing the data to understand relationship convergence and associations; and disseminating the findings to the financial sector to encourage action.
Where large volumes of data are involved, automated data collection processes such as web scraping are typically far more efficient than manual processes. The efficiency of these processes depends on the available functionality on the site, the consistency of the site structure, the complexity of the webpage design, the presence of images and/or text, and the ability of the person performing the scraping.
What is the importance of regional data sets, native language skills and country specific details such as naming conventions to this type of analysis?
The initial expectation was that we would need to mine large datasets to access information on persons holding relevant public offices and then access company ownership data and seek to find direct and indirect links. Members of the project team with extensive regional investigative experience and who were native speakers quickly and efficiently identified existing datasets that fulfilled the same requirement. These were commonly generated from pro-transparency practices, such as public listings of government officials and public listings of companies involved in forest extraction industries, which meant that it wasn’t necessary to filter down from large datasets. While not an intended outcome, this showed the effectiveness and value of open data to increase transparency to identify the risk of PEP corruption.
Regional knowledge also allowed significant enrichment by identifying additional datasets that contributed to the risk assessment model with data such as previous fraud convictions and organizational breaches or convictions regarding logging infractions, or which helped reliably identify individuals by providing secondary identifiers such as taxation number or address. Regional knowledge also helpfully assisted the project in unexpected ways, such as highlighting how naming conventions in some Spanish-speaking countries can assist in identifying family networks through the tradition of forming a surname from the parent’s surnames.
How do we drive anti-corruption activity resulting from this analysis? What are the risks and opportunities that conservation organizations must keep in mind when sharing data with other stakeholders in the anti-corruption/financial crime arena?
While a ‘big-data’ analytical approach to ‘following the money’ offers valuable time and cost saving opportunities to identify corruption threats that would be very hard to identify by other means, it does carry additional responsibilities around storing and handling personal data. These responsibilities are accentuated by linking individuals to potential criminal acts which could be considered libelous or slanderous if not conducted with due care and attention.
The risk to the organization conducting the big data assessment is dependent upon the laws of the respective jurisdiction, but many issues can be avoided by adhering to basic principles such as only collecting data that is necessary and proportionate to the question in hand, which will be destroyed promptly when no longer needed. The libelous risks can commonly be managed by ensuring that the data shared is fully referenced to existing sources and that any data sharing is done honestly and accurately to fulfil a genuine attempt to stop criminality and support the public good.
Dissemination of data relating to financial threats can be facilitated and have the risk reduced by sharing the data via a country’s Financial Intelligence Unit (FIU), as many FIUs have enabling legislation that allows them to receive and disseminate data relating to financial crime risks without risk to the originator. Legislation does vary on a country-by-country basis and should always be reviewed regarding the specifics of a planned project.
Image attribution: © naturepl.com / Jen Guyton / WWF; © Brian J. Skerry / National Geographic Stock / WWF; © Georgina Goodwin / Shoot The Earth / WWF-UK; © Hkun Lat / WWF-Aus