Skip to main content

01 Introduction to API

Data ingestion

Data ingestion is the first and fundamental step of any Data Analysis Pipeline. The focus of this section is on how is it possible to collect data from publicly available sources over the web. It is in fact a common practice nowadays to integrate the proprietary data coming from OLTP databases with data coming from public web resources in order to also have a data source which isn’t coming from the Company domain and (and could as such be biased in many ways).

The goal of both web scraping and APIs is to access web data. Web scraping allows you to extract data from any website through the use of web scraping software. On the other hand, APIs give you direct access to the data you’d want

Definition of API

What

"An API (Application Program Interface) is a set of routines, protocols and tools for building software application"

A WebAPI is just an HTTP based API.

Why

  • Separation between model and presentation
  • Regulate access to the data
    • Traceable accounts
    • Enable access to only a portion of the available data
    • Impose limits
  • Avoid direct access to the platform website
  • Request throttling
  • Provide paid access to full (or higher volume) data

HTTP requests

Following the HTTP protocol when making a request you need to specify the verb and the url.

Some of the most used verbs are:

  • GET
  • POST
  • PUT
  • DELETE
  • ...

Parameters

An endpoint can be tweaked using parameters appending to the url some encoded strings.