
Slides & Tutorials
Lecture Slides (in Portuguese)
R Tutorials (in Portuguese)
Tutorials are available in both Markdown and Google Colab:
The ZIP file includes the following tutorials in Markdown:
-
Tutorial 1. Introdução ao R
-
Tutorial 2. Regressões
-
Tutorial 3. Florestas Aleatórias
-
Tutorial 4.A Redes Neurais (com Torch)
-
Tutorial 4.B Redes Neurais (com Keras)
-
Tutorial 5. Support Vector Machines (SVM)
-
Tutorial 6. Validação de Modelos
You can access the Google Colab tutorials via the links below (note that Tutorial 4.A rely on packages that are currently not compatible with Google Colab):
Deforestation Data
This dataset was constructed to be part of the machine learning tutorials. It is focused on the 2004 deforestation in the Brazilian Amazon, measured at the municipality level, and related socioeconomic and biophysical factors. It includes 808 municipalities and 31 variables. The description of the variables is presented below:
ā
-
codigo : unique municipality ID
-
state : state acronym where the municipality is located
-
def_annMB : deforestation in 2004 based on the MapBiomas data (km2)
-
area_km2 : size of the municipality (km2)
-
PAs : municial area covered by protected areas (1000 km2)
-
env_fine_cancel : sum of environmental fines cancelled (R$)
-
env_fine : sum of environmental fines issued (R$)
-
env_fine_paid : sum of environmental fines issued (R$)
-
dist_ports : Euclidean distance to major ports
-
dist_manaus : Euclidean distance to the Manus port
-
dist_parana : Euclidean distance to the Paraná port
-
dist_arc : Euclidean distance from the Arc of Deforestation region
-
dist_capital : Euclidean distance to state capitals
-
dist_seat : Euclidean distance to municipal seats
-
incra_ha : INCRA settlement cover (ha)
-
incra_family : number of families in INCRA settlements
-
incra_cap : INCRA settlement capacity in terms of families
-
gdp : gross domestic product (R$)
-
gdp_agr : gross domestic product from the agricultural sector (R$)
-
pop : population count
-
suitability_soy : suitability index for soybean production (0 to 1)
-
suitability_pas : suitability index for pasture (0 to 1)
-
soil : soil quality (ranging from 1 - poor to 5 - excellent)
-
flo2000 : forest cover in 2000 (km2)
-
road_density : road density (km/km2)
-
road_hway_density : road density (km/km2)
-
road_hway_km : road density (km2)
-
road_km : road density (km2)
-
mayor_party : mayor party political alignment (left/center/right)
-
gov_party : governor party political alignment (left/center/right)
-
def_category : dummy indicating the top 25% municipalities with the highest deforestation in 2004 (=1) or not (=0)āāāāā
Some of my most used pieces of R code
For matching and copying a variable from dataframe d2 into dataframe d1 using a common identifier, ID:
ā
d1$variable <- d2[match(with(d1, ID), with(d2, ID)),]$variable
āāā
To compute a specific statistic per category (sum per year, in this example):
ā
library(plyr)
annual_sum <- ddply(.data = d1, .(year), .fun = summarise, total = sum(variable))