Imprimir

Programa

CURSO:INTRODUCCION A WEBSCRAPPING CON R
TRADUCCION:INTRODUCTION TO WEBSCRAPPING WITH R
SIGLA:SOL4037
CREDITOS:05 
MODULOS:02
CARACTER:OPTATIVO
TIPO:CATEDRA
CALIFICACION:ESTANDAR
DISCIPLINA:SOCIOLOGIA
PALABRAS CLAVE:METODOLOGIA DE ENCUESTAS, CIENCIA DE DATOS
NIVEL FORMATIVO:MAGISTER


I.DESCRIPCIÓN DEL CURSO

The short course provides a condensed overview of web technologies and techniques to collect data from the web in an automated way. To this end, students will use the statistical software R. The course introduces fundamental parts of web architecture and data transmission on the web. Furthermore, students will learn how to scrape content from static and dynamic web pages and connect to APIs from popular web services. Finally, practical and ethical issues of web data collection are discussed.


II.OBJETIVOS DE APRENDIZAJE 

1.learn from state-of-the-art research that draws on web-based data collection,

2.have a basic knowledge of web technologies for web scrapping, 

3.be able to assess the feasibility of conducting scraping projects in diverse settings,

4.be able to scrape information from static and dynamic websites as well as web APIs using R, and

5.be able to tackle current research questions with original data in their own work. 


III.CONTENIDOS

1.Introduction to web technologies

2.Scrapping static webpages

3.Scrapping dynamic webpages and good practice

4.Taping APIs

5.Prerequisites

6.Students are expected to be familiar with the statistical software R. Besides base R, knowledge about the ?tidyverse? packages, in particular, dplyr, plyr, magrittr, and stringr, are of help. If you are familiar with R but have no experience in working with these packages, the best place to learn them is the primary reading ?R for Data Science?.


IV.METODOLOGIA PARA EL APRENDIZAJE 

-Flipped-classroom 

-Lectures delivered through pre-recorded online video sessions

-Live meetings, via a web platform, with discussions

-Quizzes and Homeworks

-Readings


V.EVALUACION DE APRENDIZAJES 

-Weekly online assignments: 60%

-Weekly online quizzes: 30%

-Class participation in online meetings and forum: 10%


VI.BIBLIOGRAFIA

Required readings (books)

Simon Munzert, Christian Rubba, Peter Mei?ner, and Dominic Nyhuis, 2015: Automated Data Collection with R. A Practical Guide to Web Scraping and Text Mining. Chichester: John Wiley & SoN


PONTIFICIA UNIVERSIDAD CATOLICA DE CHILE
INSTITUTO DE SOCIOLOGIA / NOVIEMBRE 2018