Fetching from an API

One of the simplest examples to start with is fetching data from an API endpoint. This is often the beginning of many data pipeline journeys.

In our first case, we will use the OpenWeatherMap API to fetch the current weather in a configurable location by providing a latitude and longitude on the command line.

You will need to sign up for a free account to get an API key, once you've signed up, create an API key.

Starting a Project

One of the first differences between Rust and Python you will experience is through initializing a project.

Rust

In Rust, this is as simple as running

# Create the project
cargo init wxrs

# Add a dependency
cd wxrs
cargo add reqwest --features blocking

This will create a new directory called wxrs with a Hello World example.

It will also add the reqwest crate to our dependencies, similar to pip install. Unlike pip install though, this will also update Cargo.toml with our dependency, and create a Cargo.lock file that will lock the reqwest crate to a specific version.

The --features flag is used to express optional compilation features. Reqwest has several options, described in the crate's documentation.

We will use the blocking feature, which will allow us to use the blocking API. It gives us a simpler interface to reqwest instead of futures that require an async runtime. We will eventually use async to show the power of Rust's fearless concurrency.

In Python, we have to manually maintain dependencies and create lockfiles through updating setup.py or pyproject.toml, or using a tool like pipenv or poetry.

We are not using any specific features, but Python too allows optional features, for example pip install snowflake-connector-python[pandas].

# Cargo.toml
[package]
name = "wxrs"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
chrono = "0.4.26"
polars = { version = "0.30.0", features = ["lazy"] }
reqwest = { version = "0.11.18", features = ["blocking", "json"] }
serde = { version = "1.0.164", features = ["derive"] }
serde_json = "1.0.97"
tempfile = "3.6.0"

Python

In Python, we create the directory manually, create a virtual environment, hand-write a pyproject.toml file, name our dependencies, and then install the Python package locally.

# Python venv stuff
pyenv virtualenv wxpy
pyenv shell wxpy
pip install --upgrade pip build

# Create the project directory and pyproject.toml
mkdir -p wxpy/wxpy
cd wxpy
vim pyproject.toml

# pyproject.toml
[project]
name = "wxpy"
version = "0.0.1"
dependencies = [
    "requests",
    'importlib-metadata; python_version<"3.8"',
    "polars",
    "pandas"
]


pip install -e .

Now, admittedly we can skip all of the above steps, create a random file anywhere we want and run it with python myfile.py, but the goal here is to build a more stable distribution that can be packaged, shared, and tested.

Fetching the Weather

Now that we have a project, let's fetch the weather. To fetch from an API, we will use the requests package in Python and the reqwest package in Rust.

We will read the API key from the environment variable, and get the latitude and longitude from the command line arguments.

Given that I'm in California, it only makes sense to start with the Air Pollution API.

Python

In Python, we'll create a folder for this chapter to keep code organized, and then run that file directly.

mkdir wxpy/wxpy/ch3

# wxpy/wxpy/ch3/fetch_api.py
import os
import sys
import requests

API_KEY = os.getenv("OWM_APPID")


def get_air_pollution(lat, lon):
    url = f"http://api.openweathermap.org/data/2.5/air_pollution?lat={lat}&lon={lon}&appid={API_KEY}"
    body = requests.get(url).text
    return body


if __name__ == "__main__":
    usage = f"Usage: python {__file__} <lat> <lon>"

    if not API_KEY:
        print("Please set OWM_APPID environment variable")
        sys.exit(1)

    if len(sys.argv) != 3:
        print(usage)
        sys.exit(1)

    lat = sys.argv[1]
    lon = sys.argv[2]
    body = get_air_pollution(lat, lon)
    print(body)

Rust

In Rust, usually we'll have a main.rs file that runs our code, with additional code imported as modules from other files. There's a great convention for package layouts in Rust.

But since we want to execute our code directly, and we'll have multiple binaries, we'll create a bin folder and save the code for ch3 there.

mkdir wxrs/src/bin

// wxrs/src/bin/ch3.rs
pub fn get_air_pollution(lat: f32, lon: f32) -> String {
    let api_key = std::env::var("OWM_APPID").expect(
        "Environment Variable OWM_APPID not set. Please set it to your
    OpenWeatherMap API key. https://home.openweathermap.org/api_keys",
    );

    let url = format!(
        "http://api.openweathermap.org/data/2.5/air_pollution?lat={}&lon={}&appid={}",
        lat, lon, api_key
    );
    reqwest::blocking::get(url)
        .expect("request failed")
        .text()
        .expect("body failed")
}

pub fn main() {
    let usage = format!("Usage: {} [lat] [lon]", std::env::args().next().unwrap());

    let lat = std::env::args()
        .nth(1)
        .expect(&usage)
        .parse::<f32>()
        .expect(&usage);

    let lon = std::env::args()
        .nth(2)
        .expect(&usage)
        .parse::<f32>()
        .expect(&usage);

    let body = get_air_pollution(lat, lon);
    println!("{}", body);
}

Running the Program

Running the program is simple in both languages. We'll provide the latitude and longitude of beautiful Fairfax, CA, birthplace of mountain biking, and nestled in the foothills of Mount Tamalpais.

Google gives the coordinates as 37.9871 and -122.5889

Python

In Python, we can use -m to run the module directly.

# in wxpy/wxpy
# export OPENWEATHERMAP_API_KEY=your-api-key
python -m wxpy.ch3.fetch_api 37.9871 -122.5889

> {"coord":{"lon":-122.5889,"lat":37.9871},"list":[{"main":{"aqi":2},"components":{"co":178.58,"no":0.1,"no2":0.47,"o3":70.81,"so2":0.64,"pm2_5":2.58,"pm10":4.18,"nh3":0},"dt":1687221287}]}

Rust

In Rust, we must first compile the program before running it. If we run cargo build Rust will create a binary for us in ./target/debug/wxrs

We can also compile and run it with one command cargo run

When using cargo build Rust will a debug version of our application in ./target/debug for both the main.rs file which will be named wxrs as well for any files located in src/bin, such as ch3.rs

# in wxrs/
cargo build
./target/debug/ch3 37.9871 -122.5889
> {"coord":{"lon":-122.5889,"lat":37.9871},"list":[{"main":{"aqi":2},"components":{"co":178.58,"no":0.1,"no2":0.47,"o3":70.81,"so2":0.64,"pm2_5":2.58,"pm10":4.18,"nh3":0},"dt":1687221453}]}

# or
cargo run --bin ch3 37.9871 -122.5889
> {"coord":{"lon":-122.5889,"lat":37.9871},"list":[{"main":{"aqi":2},"components":{"co":178.58,"no":0.1,"no2":0.47,"o3":70.81,"so2":0.64,"pm2_5":2.58,"pm10":4.18,"nh3":0},"dt":1687221453}]}

Discussion

Looking at both programs, we can see a fairly similar approach to solving this problem.

Both programs use an external library or crate (not-so-coincidentally named request/reqwest).

In both programs, we've created a function that takes a latitude and longitude, fetches the results from an API and returns the results as text. We'll cover handling structured data from JSON soon.

Types

One obvious difference is that in Rust, we declare the types of the lat and lon arguments, and in Python we do not. The trouble with talking about types is that it inevitably leads to a discussion of memory, which can devolve into a conversation around null pointer references, which we will largely avoid until the next chapter, but here's a light introduction.

In the Rust code, we've very explicitly defined the types for our function:

#![allow(unused)]
fn main() {
pub fn get_air_pollution(lat: f32, lon: f32) -> String {
}

Both lat and lon are f32 or 32-bit floats. These are floating-point numbers that take exactly 32-bits of memory. The compiler knows exactly how much space to reserve for these values: 32-bits, or 4-bytes.

Given that lat and lon doesn't require much precision beyond a few decimals, f32 seems like the best choice for our code. We could even opt for greater precision by using a 64-bit float or f64 in Rust which would take 8 bytes of memory.

Because we know exactly how much memory we need for these variables, Rust is able to store these values on the heap.

In Python, we don't know how much memory lat and lon need until runtime because Python will accept anything in this function.


def get_air_pollution(lat, lon):

We could pass it a string, numbers, another function, or even None.

>>> def join_two(a, b):
...     return f"a+b={a}+{b}"
...
>>> join_two(1,2)
'a+b=1+2'
>>> join_two(None, None)
'a+b=None+None'
>>> join_two(join_two, join_two)
'a+b=<function join_two at 0x7f8f7f7de980>+<function join_two at 0x7f8f7f7de980>'
>>> join_two(join_two, join_two(join_two, join_two))
'a+b=<function join_two at 0x7f8f7f7de980>+a+b=<function join_two at 0x7f8f7f7de980>+<function join_
two at 0x7f8f7f7de980>'

Even the url line will not fail, because in Python duck-typing allows us great flexibility in what we do with variables. We can pass numbers into an f-string for concatenation just as easily as we can pass characters.

Python will allocate these values on the heap, and it turns out that Python allocates about 24 bytes for each float there. The actual values are stored in a private heap.

Now, the difference between 24 bytes and 8 bytes is trivial for an application such as this, and even on the most memory-constrained devices it's not worth noting. But it's important to know that heap allocation is slower, and even small applications may iterate over millions of values. Small differences can add up.

You might then ask yourself: what about mypy? Doesn't that give us typing? Mypy is a static type checker, but it doesn't change the underlying compilation of Python code. It can provide hints as to what you expect the types to be, but it doesn't change how memory is allocated.

Handling Errors

Another subtle but important difference is the handling of errors. In Python, errors are handled as exceptions that are caught. Knowing when to catch an exception is mostly an art. It's difficult to know which functions throw exceptions, what exceptions to expect, and when to deal with them.

In Rust, errors are handled as values that are returned. This is a much more explicit approach, and it's easier to know what errors to expect and how to handle them. In fact, if a function returns a Result type, the compiler will force you to handle the error. This is a huge benefit to Rust, and it's one of the reasons why Rust is so reliable.

Let's take a closer look at an exception we haven't caught yet. If we run the Python program with invalid arguments, we get a ValueError exception.

python -m wxpy.ch3.fetch_api nice birds
> ❯ python -m wxpy.ch3.fetch_api nice birds
{"cod":"400","message":"wrong latitude"}

./target/debug/ch3 nice birds

> thread 'main' panicked at 'Usage: ./target/debug/ch3 [lat] [lon]: ParseFloatError { kind: Invalid }', src/main.rs:24:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

What happened here?

In Python, we didn't check that our input is valid, and so the application send incorrect values to the API, which returned an error message. Fortunately, we get a million API requests for free a month, so this one doesn't cost us much.

In Rust, the application panicked because it could not parse the inputs we provided as a float. On line 23:24 we call parse on the arguments, and we expect them to be floats.

#![allow(unused)]
fn main() {
    let lat = std::env::args()
        .nth(1)
        .expect(&usage)
        .parse::<f32>()
        .expect(&usage);
}

the expect method tells Rust that if parse failed to convert the input, then the application must panic. We output the usage message and exit.

You will see expect and its cousin unwrap used frequently in Rust. They are useful for debugging, but they are not the best way to handle errors. We'll cover error handling in more detail soon.

Benchmarks

Let me preface this by saying speed isn't everything. No doubt someone familiar in Python will spend far more time learning Rust than they might ever save by running a slightly more optimized program. But it is nice to get a sense of

Let's use hyperfine to benchmark the two programs. We'll run each program 10 times and take the average. Before we benchmark the Rust application, we'll compile it using --release which builds a release rather than a debug version and it should provide us with a faster application.

cargo build --release

# in wxpy
hyperfine --warmup 3 --min-runs 10 \
    'python -m wxpy.ch3.fetch_api 37.9871 -122.5889' \
    './target/release/ch3 37.9871 -122.5889' \
    --export-markdown ../benchmarks/ch3_fetch_api.md

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`../wxrs/target/release/ch3 30 -140`	177.0 ± 6.5	166.2	189.9	1.00
`python ../wxpy/wxpy/ch3/fetch_api.py 30 -140`	320.2 ± 46.8	261.2	379.7	1.81 ± 0.27

On my system, the Python application took an average of 335ms to complete, while the Rust application was 1.7x faster at 198ms. Memory consumption was also lower in Rust, with the Python application using 26MB vs only 10MB in Rust.

Again, this is a trivial application with trivial requirements and performance is not a key factor in deciding what language to build. But as we build more intensive applications we'll keep an eye on memory and performance to see how the gap changes.

Summary

In this chapter we've built a simple application that fetches data from an API and returns the results. We've seen how Rust and Python differ in their approach to handling errors and types, and we've seen how Rust can be faster and more memory efficient than Python.

Rust For Data