top of page
Education

Slides & Tutorials

Lecture Slides (in Portuguese)

  • Introdução ao Aprendizado de Máquina

  • Regressões Lineares e Não-Lineares

  • Florestas Aleatórias (Random Forests)

  • Redes Neurais Artificiais (Artificial Neural Networks)

  • Máquinas de Vetores de Suporte (Support Vector Machines)

  • Validação de Modelos

R Tutorials (in Portuguese)

The ZIP file includes the following tutorials:

  • Tutorial 1. Introdução ao R

  • Tutorial 2. Regressões

  • Tutorial 3. Florestas Aleatórias

  • Tutorial 4.A Redes Neurais (com Torch)

  • Tutorial 4.B Redes Neurais (com Keras)

  • Tutorial 5. Support Vector Machines (SVM)

  • Tutorial 6. Validação de Modelos

Deforestation Data

This dataset was constructed to be part of the machine learning tutorials. It is focused on the 2004 deforestation in the Brazilian Amazon, measured at the municipality level, and related socioeconomic and biophysical factors. It includes 808 municipalities and 31 variables. The description of the variables is presented below:

​

  1. codigo : unique municipality ID

  2. state : state acronym where the municipality is located

  3. def_annMB : deforestation in 2004 based on the MapBiomas data (km2)

  4. area_km2 : size of the municipality (km2)

  5. PAs : municial area covered by protected areas (1000 km2)

  6. env_fine_cancel : sum of environmental fines cancelled (R$)

  7. env_fine : sum of environmental fines issued (R$)

  8. env_fine_paid : sum of environmental fines issued (R$)

  9. dist_ports : Euclidean distance to major ports

  10. dist_manaus : Euclidean distance to the Manus port

  11. dist_parana : Euclidean distance to the Paraná port

  12. dist_arc : Euclidean distance from the Arc of Deforestation region

  13. dist_capital : Euclidean distance to state capitals

  14. dist_seat : Euclidean distance to municipal seats

  15. incra_ha : INCRA settlement cover (ha)

  16. incra_family : number of families in INCRA settlements

  17. incra_cap : INCRA settlement capacity in terms of families

  18. gdp : gross domestic product (R$)

  19. gdp_agr : gross domestic product from the agricultural sector (R$)

  20. pop : population count

  21. suitability_soy : suitability index for soybean production (0 to 1)

  22. suitability_pas : suitability index for pasture (0 to 1)

  23. soil : soil quality (ranging from 1 - poor to 5 - excellent)

  24. flo2000 : forest cover in 2000 (km2)

  25. road_density : road density (km/km2)

  26. road_hway_density : road density (km/km2)

  27. road_hway_km : road density (km2)

  28. road_km : road density (km2)

  29. mayor_party : mayor party political alignment (left/center/right)

  30. gov_party : governor party political alignment (left/center/right)

  31. def_category : dummy indicating the top 25% municipalities with the highest deforestation in 2004 (=1) or not (=0)​​​​​

Some of my most used pieces of R code

For matching and copying a variable from dataframe d2 into dataframe d1:

​

d1$variable <- d2[match(with(d1, ID), with(d2, ID)),]$variable

​​​

To compute a specific statistic per category (sum per year, in this example):

​

library(plyr)

annual_sum <- ddply(.data = d1, .(year), .fun = summarise, total = sum(variable))
 

bottom of page