Databasing (501)

Imagine there is an application which allows folks to plan trips together. For example, it might be friends planning trips with friends or it might be a travel agency that organizes tours. You will be implementing the backend for this, which will include a database and RESTful endpoints that expose that functionality to the client. You will learn to rapidly prototype by iterating on the implementation. First you will implement this using PostgreSQL (a relational database), then reimplement using MongoDB (a document-based “NoSQL” database), then reimplement using AWS’s DynamoDB (a distributed “NoSQL” database).

The first four tasks are clearly backend databasing. The fifth task is databasing on the mobile device. The sixth and last task reaches beyond.

Your Product Requirements Document (PRD) is the following endpoints:

  • GET /users

  • GET /trips

  • GET /user/{id}

  • GET /trip/{id}

  • GET /user/{id}/travel-buddies, which returns a list of user IDs of all the users have gone on trips with this particular user.  That is, this endpoint returns a list of the given user’s travel buddies.

Learning outcomes:

  • Learning not to violate your REST contract with the client. (For example, an id field in your JSON response should not suddenly change to _id just because you switched from Postgres to Mongo; the keys in JSON payloads, http status codes, and http headers should remain the same.) This is a particular case of abstracting an implementation via dependency inversion—recall SOLID principles of (OO)D.

  • Learning to not get too attached to a given implementation and learn to quickly and iteratively prototype.

  • Getting a feel for how data modeling differs in relational databases compared to the single-table design that drives performance in a distributed NoSQL database.


Task 1. Postgres

Your first task is to use Supabase to create the following two tables: Users and Trips.  Each table should have an ID.  The ID should be a primary key for that table.  Let a user have the following properties: id, name, email.  These will be your columns in the Users table.  Next, let a trip have the following properties: id, name, destination, start date, end date. Next, imagine that you want to have users go on trips.  How would design your database to support that?  The first idea that might come to mind is to add a “users” column to Trips, which would hold an array of user IDs.  You can try that and it’s worth thinking through to understand why it’s a cumbersome solution.  Firstly, you would have to implement the validation of IDs yourself: did your code insert a valid user ID into the array?  Secondly, you might have to search an array for a given ID: for example, is a given user part of a given trip?  There is, in fact, a more elegant way to solve this problem.  Namely, you could instead create an association table.Deliverables for the databasing bit due a week from today:

  1. Everyone on your team has to have implemented the above, along with an association table.  (Do not implement the cumbersome solution; skip to using an association table off the bat instead.)

  2. Everyone on your team has to have implemented the endpoints from the PRD (see above), including a Swagger doc.

  3. Everyone on your team is able to re-implement this assignment, without notes, in 15 minutes if I were to ask.  This ask does not include -e- 

  4. Everyone on your team has not only the skill to implement (bullet 3 above), but also the conceptual understanding.

You can work together, even though the deliverables are individual.  Help each other understand.


Task 2. Mongo

Reimplement using MongoDB and an ORM. For the ORM, you can use Mongoose or any other ORM of your choice.


Task 3. Dynamo

Reimplement using DynamoDB. You can use AWS’s SDK for your language of choice or an ORM. For your DynamoDB endpoints, try API Gateway and AWS Lambda.


Task 4. Single-table design

If you haven’t already done this as part of Task 3, reimplement (still using Dynamo) using single-table design.

Read:


Task 5. Your client’s offline mode

Add a simple client (e.g. an iOS app) and implement the caching of this data on the client (e.g. using CoreData). Note that, for example, CoreData uses SQLite, which is a relational database SDK. You can, in fact, use the DB Browser for SQLite to examine the tables CoreData creates for you under the hood.


Task 6. gRPC

Markdown:

# The gRPC exercise for this week:
1. Request-response.
2. Streaming from the server to the client—-a "server push"
3. Streaming from the client to the server
4. The HTTP/2 network protocol
5. gRPC over http/2

You should get in the habit of looking at the official documentation first.  Looking at the official document is a requirement for this exercise.  That said, in addition, you can also look at whatever online blogs you want and can use ChatGPT to help you (e.g. by asking it questions or even by asking it to generate relevant code for you).  You are expected to understand the code that you wrote or copied from a blog or which ChatGPT generated for you.

Terminology:
- The word "server" does not mean a physical server but an application—-a process that is listening on some port at some IP address.  "Server" is short for a "server application".
- The word "client" does not mean "the end user".  It refers to a process that connects to a server on some port at some IP address.  "Client" is short for "client application".

Note that both the client and the server can be running on localhost (which, conventionally, happens to have an IPv4 address of 127.0.0.1).  For this exercise, you can run both on localhost if this makes your life simpler.  You could also grab some official Docker image and launch a container if this helps you save time installing things, or forego Docker.

# 1. Request-response

Create a gRPC server and client for the equivalent of a `GET /users` and `GET /users/{id}` endpoint.  Your response can be mocked, so no need to actually connect to the database.  Note that you will need to define a proto for this.

Some jargon: "protobuf" is short for "protocol buffer" and defines your data model; you will need to use the `protoc` compiler with a protoc plugin for the programming language you choose to "compile" your definitions from the proto language-agnostic format to the programming language you choose for your server and client.  (Note that protoc plugins can be _chained_.)

## Your server

I recommend golang, the Go programming language, for your server.  (But if you prefer, you are welcome to use a different programming language.)

First write the server and then, before proceeding to write the client, verify that your server works using some other method.  First, try to use grpcurl.  (Go back to your REST APIs and try using `curl` with them; `grpcurl` is the gRPC analogue of `curl`.)  Optionally, if you want to try to use Postman second, you are welcome to try it with Postman too—-Postman should, by now, have gRPC support.

## Your client

For your client, you can use whatever you want.  For example, your client could be a command-line application written in golang or it could be an iOS application written in Swift (if you are interested in mobile development this would make sense) or anything else you are familiar with that has protoc and gRPC support.

# 2. Streaming from the server to the client—-a "server push"

Implement a mock server which sends the GPS coordinates of a list of users to your server.  Your list should not be empty and the users should not be hardcoded.  The GPS coordinates should change once every second to reflect a walking pace (so not, a running pace and not a driving pace and not Superman circling Planet Earth).

Q: If you are building a mobile application and all you have is a REST API, so no gRPC and no plans to support gRPC on the backend, how would you support this type of functionality?  What are you options?  (e.g. to stream or to poll)

# 3. Streaming from the client to the server

Obviously you could stream in the other direction too: from the client to the server.  This is optional, but you can also POC the stream in the other direction.  An obvious use case would be uploading a file or sending text messages from the client to the server in an active chat (instead of POSTing them using a REST API).

# 4. The HTTP/2 network protocol

Can gRPC use HTTP/1.0 or HTTP/1.1?  If not, what version of HTTP does it support?

HTTP/2 has the following three features which previous versions of the http network protocol did not have:

1. http headers are (can be or always are?) sent in a compressed binary format rather than plain text. See section 4.3.1 of [RFC 9913](https://datatracker.ietf.org/doc/html/rfc9113) and [DigitalOcean's description](https://www.digitalocean.com/community/tutorials/http-1-1-vs-http-2-what-s-the-difference#) for "HPACK", as well as the Overview section 1.1 for "Huffman" [here](https://datatracker.ietf.org/doc/html/rfc7541),

2. allows the client to signal priority of requests (e.g. "this request is urgent" vs. "this one is less so").  See [section 5.3 of RFC 9113](https://datatracker.ietf.org/doc/html/rfc9113#name-prioritization), and

3. has multiplexing.  This has to do with making concurrent calls.

So, compared to HTTP/1.1, HTTP/2 is _multiplexed_.  What does that mean?  How might that be an advantage?

Section 5.3 of RFC 9113 says: "A poor prioritization scheme can result in HTTP/2 providing poor performance. With no parallelism at the TCP layer, performance could be significantly worse than HTTP/1.1" (!).  What about multiplexing?  Can that ever result in _worse_ performance?  See [Lucidchart's tech blog](https://www.lucidchart.com/techblog/2019/04/10/why-turning-on-http2-was-a-mistake/).

# 5. gRPC over http/2

Firstly, why does converting from plain text (e.g. ASCII or Unicode) to binary help compress data in the first place?  Secondly, if you can compress your payload by compressing just the http headers, why not compress the entire payload (and if so, why doesn't the http/2 protocol not do that out of the box for you)?

So, gRPCs, which runs over http/2, sends its paylods in binary.  It's clear how that can be a performance advantage (e.g. for mobile devices on a poor network connection or which are low on battery).  Can this ever be a disadvantage?

Most importantly, if you don't have gRPC, how would you stream data from the server to the client?  Let's say all you have is http/1.1 or, perhaps, your backend colleagues are simply not ready to transition the entire org to gRPC and don't want to do a partial transition just to support the feature you are building.