<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Piccolo</title>
        <link>https://piccolo-orm.com/</link>
        <description>Articles about the Piccolo ORM and Python development.</description>
        <lastBuildDate>Tue, 08 Oct 2024 22:25:25 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>Gridsome Feed Plugin</generator>
        <atom:link href="https://piccolo-orm.com/feed.xml" rel="self" type="application/rss+xml"/>
        <item>
            <title><![CDATA[Piccolo Admin forms - downloading files!]]></title>
            <link>https://piccolo-orm.com/blog/piccolo-admin-forms-downloading-files/</link>
            <guid>https://piccolo-orm.com/blog/piccolo-admin-forms-downloading-files/</guid>
            <pubDate>Tue, 08 Oct 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
<iframe width="728" height="400" src="https://www.youtube.com/embed/ZAtxXUsptaw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

[Piccolo Admin](https://piccolo-admin.readthedocs.io/en/latest/) lets you easily add [custom forms](https://piccolo-admin.readthedocs.io/en/latest/custom_forms/index.html) to the UI - all you need to do is provide a Pydantic model and an endpoint.

Up until recently these forms could just return a string, which is shown to the user when the form is submitted.

You can now return files instead - see the [docs](https://piccolo-admin.readthedocs.io/en/latest/custom_forms/index.html#fileresponse).

This is really useful, especially for reporting purposes. For example, if your data science team needs a CSV report, we can build a custom form for them, so they can download the report whenever they want.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[SELECT FOR UPDATE in Piccolo / Postgres]]></title>
            <link>https://piccolo-orm.com/blog/select-for-update-in-piccolo-postgres/</link>
            <guid>https://piccolo-orm.com/blog/select-for-update-in-piccolo-postgres/</guid>
            <pubDate>Fri, 27 Sep 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
<iframe width="728" height="400" src="https://www.youtube.com/embed/qlFYQXrNBBI?si=axeNMvwOCS46Dofu" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

In our latest video we explore how `SELECT FOR UPDATE` works in Postgres, and how to use it in Piccolo.

We also demonstrate how it prevents vulnerabilities like the [ACIDRain](https://dl.acm.org/doi/10.1145/3035918.3064037) attack.

To learn more about how to use `SELECT FOR UPDATE`, see the [Piccolo docs](https://piccolo-orm.readthedocs.io/en/latest/piccolo/query_clauses/lock_rows.html), and the [Postgres docs](https://www.postgresql.org/docs/current/sql-select.html#SQL-FOR-UPDATE-SHARE).

The source code used in the video is available [here](https://github.com/piccolo-orm/piccolo_videos/blob/main/select_for_update/main.py).
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Piccolo Admin - Multi-factor Authentication now available!]]></title>
            <link>https://piccolo-orm.com/blog/piccolo-admin-multi-factor-authentication-now-available/</link>
            <guid>https://piccolo-orm.com/blog/piccolo-admin-multi-factor-authentication-now-available/</guid>
            <pubDate>Tue, 10 Sep 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
<iframe width="726" height="400" src="https://www.youtube.com/embed/S24JoFdWxwQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

We've been working on adding Multi-factor Authentication (MFA) support to Piccolo Admin for several months now, and it's finally here!

Read more in the [Piccolo Admin docs](https://piccolo-admin.readthedocs.io/en/latest/mfa/index.html).

Look out for a future article discussing some of the design decisions behind the MFA implementation.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Testing Python Type Annotations]]></title>
            <link>https://piccolo-orm.com/blog/testing-python-type-annotations/</link>
            <guid>https://piccolo-orm.com/blog/testing-python-type-annotations/</guid>
            <pubDate>Sat, 07 Jan 2023 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Piccolo uses type annotations extensively, and we recently gave it a big upgrade by leveraging [``TypeVar``](https://docs.python.org/3/library/typing.html#typing.TypeVar) and [``Generic``](https://docs.python.org/3/library/typing.html#typing.Generic).

With an ORM like Piccolo, when we have a table such as this:

```python
class Band(Table):
    name = Varchar()
```

When we query the table, we expect a list of ``Band`` objects to be returned:

```python
>>> await Band.objects()  # list[Band]
```

Wouldn't it be great if we could write a test, to make sure the type annotations don't break? Just as we write unit tests, we can do something similar for our type annotations.

We do this using ``assert_type``. Here's an example using mypy:

```python
# main.py

# For Python 3.11 and above:
from typing import assert_type

# Otherwise `pip install typing_extensions`, and use the following:
from typing_extensions import assert_type

# The function needs type annotations otherwise mypy will ignore it:
async def test() -> None:
    # This will pass:
    assert_type(await Band.objects(), list[Band])

    # This will fail:
    assert_type(await Band.objects(), str)
```

``mypy`` will show an error if the type assertion fails:

```
>>> mypy main.py
main.py: error: Expression is of type "list[Band]", not "str"  [assert-type]
```

Check out his [type checking file in Piccolo](https://github.com/piccolo-orm/piccolo/blob/fdb703f4abf461dc323776d9f2611a1dc92a6c92/tests/type_checking.py) - we use it to make sure that all sorts of queries have the correct type annotations.

We run the tests as part of our CI pipeline, which lets us know if something breaks.

Not every project will require these kinds of tests, but for libraries, and certain apps, it can be incredibly useful.

## Related

To learn more about ``TypeVar``, check out our [article about it](../advanced-type-annotations-using-python-s-type-var/).
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Advanced type annotations using Python's TypeVar]]></title>
            <link>https://piccolo-orm.com/blog/advanced-type-annotations-using-python-s-type-var/</link>
            <guid>https://piccolo-orm.com/blog/advanced-type-annotations-using-python-s-type-var/</guid>
            <pubDate>Sat, 07 Jan 2023 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Type annotations are very common now in the Python world.

The [``typing``](https://docs.python.org/3/library/typing.html) module has a lot of powerful features, and in this article we'll explore [``TypeVar``](https://docs.python.org/3/library/typing.html#typing.TypeVar), which is essential for annotating certain functions correctly.

## Simple annotations

A simple type annotated function is shown below:

```python
def get_message(name: str) -> str:
    return f'Hello {name}'
```

We pass in a string, and return a string - nice and easy.

## Advanced annotations using ``TypeVar``

There are some situations where we have to get more creative with our type annotations. Consider the function below, which doubles the number we pass into it:

```python
def double(value: int | float | decimal.Decimal):
    return value * 2
```

Several value types are allowed (``int``, ``float`` and ``Decimal``). We could add the following return type:

```python
def double(
    value: int | float | decimal.Decimal
) -> int | float | decimal.Decimal:
    return value * 2
```

But when you think about it, it doesn't really make sense. When we pass in an ``int``, we should get an ``int`` returned. What this type annotation is saying is that when we pass in an ``int``, then we could get back an `int`, `float` or `Decimal`.

This is where `TypeVar` comes in. It allows us to do this:

```python
import decimal
from typing import TypeVar

Number = TypeVar("Number", int, float, decimal.Decimal)

def double(value: Number) -> Number:
    return value * 2
```

This tells static analysis tools like [``mypy``](https://mypy.readthedocs.io/en/stable/) and [``Pylance``](https://marketplace.visualstudio.com/items?itemName=ms-python.vscode-pylance) that the type returned by the function is the same as the type which was passed in.

It also tells the type checker that values other than ``int``, ``float`` and ``Decimal`` aren't allowed:

```python
double("hello")  # error
```

Piccolo uses ``TypeVar`` extensively - without it, it would be impossible to provide correct types for certain functions. Give it a go!
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[CockroachDB support]]></title>
            <link>https://piccolo-orm.com/blog/cockroach-db-support/</link>
            <guid>https://piccolo-orm.com/blog/cockroach-db-support/</guid>
            <pubDate>Mon, 17 Oct 2022 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Piccolo now has support for [CockroachDB](https://en.wikipedia.org/wiki/CockroachDB).

CockroachDB is a highly fault tolerant database, written in Golang. It uses a similar SQL syntax to Postgres, which makes it easier to integrate with existing Postgres tools like Piccolo.

It's an exciting development, and we're proud to support another production-grade database.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Taking our UI testing to the next level with Cypress]]></title>
            <link>https://piccolo-orm.com/blog/taking-our-ui-testing-to-the-next-level-with-cypress/</link>
            <guid>https://piccolo-orm.com/blog/taking-our-ui-testing-to-the-next-level-with-cypress/</guid>
            <pubDate>Sun, 04 Sep 2022 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/cypress-tests/cypress-results.png" alt="Cypress results" />
</a>
<figcaption>Cypress test results</figcaption>
</figure>

Testing is an integral part of software development, and arguably even more so with open source projects.

Piccolo and its related projects have extensive test suites, mostly consisting of unit tests for the backend Python code.

One of the most important parts of the Piccolo ecosystem is [Piccolo Admin](https://github.com/piccolo-orm/piccolo_admin/), a powerful admin interface / content management system. It contains a lot of UI code (written with Vue.js), and builds upon the rest of the Piccolo ecosystem ([`piccolo_admin`](https://github.com/piccolo-orm/piccolo_admin/) is built on [`piccolo_api`](https://github.com/piccolo-orm/piccolo_api/), which is built on [`piccolo`](https://github.com/piccolo-orm/piccolo/))

 By running integration / UI tests on Piccolo Admin we can make sure that the Piccolo ecosystem of libraries is working together as expected.

 ## Why Cypress?

 [Cypress](https://github.com/cypress-io/cypress) has become very popular. It's developer friendly, and productive. It's similar to tools like [Selenium](https://en.wikipedia.org/wiki/Selenium_(software)).

You write tests in Javascript, and the tests run within a web browser (typically headless Chrome).

The tests are simple to write - in the example below, we type some content into a form, and then submit it:

```javascript
// Fill the username
cy.get('[name="username"]')
    .type('piccolo')
    .should('have.value', 'piccolo');

// Fill the password
cy.get('[name="password"]')
    .type('piccolo123')
    .should('have.value', 'piccolo123');

// Locate and submit the form
cy.get('form')
    .submit();

// Make sure the correct page was rendered
cy.location('pathname', { timeout: 5000 })
    .should('eq', '/');
```

You can see some [full examples in our Git repo](https://github.com/piccolo-orm/piccolo_admin/tree/master/admin_ui/cypress/integration).

By writing these automated tests, we increase our confidence in the code base, and it reduces the amount of manual testing we have to do.

## GitHub Actions

Having Cypress tests is great, but to get the maximum value we need to run them as part of our CI pipeline. Luckily there's an official tool for doing this: [cypress-io/github-action](https://github.com/cypress-io/github-action).

You can see our [full YAML config here](https://github.com/piccolo-orm/piccolo_admin/blob/master/.github/workflows/cypress.yaml), and an [example of a successful pipeline run](https://github.com/piccolo-orm/piccolo_admin/actions/runs/2989278617).

Once the pipeline has run, we can see how many tests passed / failed:

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/cypress-tests/cypress-results-github.png" alt="Cypress results on GitHub Actions" />
</a>
<figcaption>Cypress results on GitHub Actions</figcaption>
</figure>

Even though the tests are running in headless Chrome, Cypress can generate screenshots and videos of the tests, so we save these as artifacts:

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/cypress-tests/cypress-artifacts.png" alt="Cypress artifacts on GitHub Actions" />
</a>
<figcaption>Cypress artifacts on GitHub Actions</figcaption>
</figure>

The artifacts are stored in a zip file, which we can download and inspect.

When I started using Cypress, the screenshots and videos were something which really impressed me.

Here we can see a test in action - which navigates around the app, and submits some forms:

<video width="1280" height="720" controls>
  <source src="https://piccolo-orm.com/images/blog/cypress-tests/cypress-test-video.mp4" type="video/mp4">
</video>

It's like having our own personal android!

## Moving forward

Now we have some initial Cypress tests, and have it integrated with our CI, where do we go next?

There are lots more Cypress tests left to write, and we want to get to a position where every new feature is accompanied by a set of Cypress tests.

If you want to [get involved](https://github.com/piccolo-orm/piccolo_admin/), and learn more Cypress, then you're welcome to join us. Cypress is suprisingly fun, and a valuable skill to learn.

## Update - now using Playwright!

We ended up migrating to [Playwright](https://playwright.dev/), which is a similar framework, but the tests can be written in Python.

Being able to write the tests in Python is a huge boon for us, as we can test the UI (for example submitting a form), and then use Piccolo to query the database to make sure the data was modified.

I'm still a fan of Cypress, but Playwright is the obvious choice for Python developers.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Piccolo Admin is now multilingual]]></title>
            <link>https://piccolo-orm.com/blog/piccolo-admin-is-now-multilingual/</link>
            <guid>https://piccolo-orm.com/blog/piccolo-admin-is-now-multilingual/</guid>
            <pubDate>Thu, 21 Jul 2022 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
<iframe width="726" height="400" src="https://www.youtube.com/embed/VbWnwChVnpM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

After a surprisingly large amount of work, Piccolo Admin has multilingual support!

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/piccolo-admin-multilingual-support/piccolo_admin_multilingual_portuguese.png" alt="Piccolo Admin multilingual support - Portguese" />
</a>
<figcaption>Piccolo Admin, in Portuguese</figcaption>
</figure>

Piccolo Admin detects your preferred language based on your browser settings, and will try and translate the UI accordingly, if we have a matching translation available.

You can also manually select your preferred language, using the new button in the nav bar, shown in the image above.

The languages we support for now are:

 * Croatian
 * French
 * German
 * Portuguese
 * Spanish
 * Welsh

Some of these were provided by native speakers of the language (Croatian, Portuguese), and the rest are machine translated. If you find any errors in our translations, or would like to contribute translations for another language, here's a [guide on how to contribute](https://piccolo-admin.readthedocs.io/en/latest/contributing/index.html#translations). It should only take a few minutes, and is very appreciated!
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Piccolo Admin - bulk updates, and more!]]></title>
            <link>https://piccolo-orm.com/blog/piccolo-admin-bulk-updates-and-more/</link>
            <guid>https://piccolo-orm.com/blog/piccolo-admin-bulk-updates-and-more/</guid>
            <pubDate>Fri, 08 Jul 2022 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
We've recently added some exciting new features to Piccolo Admin, such as the ability to bulk modify rows. This is a huge time saver for users.

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/piccolo-admin-bulk-updates/piccolo-admin-bulk-updates-screenshot.png" alt="Piccolo Admin bulk update screenshot" />
</a>
<figcaption>Piccolo Admin, bulk update</figcaption>
</figure>

You can learn more in the video below:

<iframe width="726" height="400" src="https://www.youtube.com/embed/hYtGUBwTS6Q" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[A guide to managed PostgreSQL services]]></title>
            <link>https://piccolo-orm.com/blog/a-guide-to-managed-postgre-sql-services/</link>
            <guid>https://piccolo-orm.com/blog/a-guide-to-managed-postgre-sql-services/</guid>
            <pubDate>Mon, 23 May 2022 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
In this article we're going to look at some of the managed Postgres services which are available, and why you might use one.

![Postgres managed servives](/images/blog/managed-postgres/guide_to_managed_postgres.png)

## Why consider a managed Postgres service?

Installing Postgres on a Linux server couldn't be simpler. In the case of Ubuntu, we just need to run `apt install postgresql`.

However, creating a robust production setup requires several additional steps:

- Setting up regular backups (perhaps using [pgBackRest](https://pgbackrest.org/) or [Barman](https://pgbarman.org/))
- Tuning Postgres (using [PGTune](https://pgtune.leopard.in.ua/#/))
- Installing [PgBouncer](https://www.pgbouncer.org/) for connection pooling
- Creating certificates, if you want clients to [connect over SSL](https://www.postgresql.org/docs/current/ssl-tcp.html)
- Mounting a volume, so the storage available can be scaled in the future

Things get even trickier when we need to run a cluster of Postgres servers, with a [warm standby](https://www.postgresql.org/docs/current/warm-standby.html).

This isn't a fault of Postgres - running a robust production setup is challenging, no matter which database you're using. To solve this problem there are many managed Postgres services now, offered by most major cloud providers.

## The importance of backups

When using a managed Postgres service, backups are created automatically. If you were to create your own backup system from scratch, you would have to do the following:

- Backup the data
- Monitor that your backup system is working
- Periodically test that the backups can be restored

So building a robust backup system is a fair chunk of work.

The importance of backups for databases can't be overstated. Even though you could run Postgres in production for a decade, and in 99.9% of cases experience no issues, having backups helps mitigate many disaster scenarios:

- The server hardware fails
- The [data center burns down](https://www.techradar.com/uk/news/remember-the-ovhcloud-data-center-fire-heres-why-it-was-so-bad)
- A malicious actor manages to delete or corrupt data, for example via a [SQL injection attack](https://owasp.org/www-community/attacks/SQL_Injection)
- [Human error](https://www.reddit.com/r/cscareerquestions/comments/6ez8ag/accidentally_destroyed_production_database_on/) when running a routine database query

Database backups are the last line of defence when things have gone terribly wrong.

## Different approaches to backups

You need to be careful because some cloud providers have more robust backup systems than others.

Some cloud providers just do a daily backup of your database. If the backups are taken at midnight, and we are unfortunate enough to have a problem at 11pm, then we will lose 23 hours of data. If the database is powering a simple blog, then it's not a big issue. But if it's something like a [SAAS](https://en.wikipedia.org/wiki/Software_as_a_service) app, then losing that much customer data would be unacceptable.

A better approach is point in time recovery (PITR). Postgres uses [WAL files (Write Ahead Log)](https://www.postgresql.org/docs/current/wal-intro.html) as a record of database changes. If the cloud provider regularly backs up these files (typically every 5 minutes), then Postgres can recreate a database's state for a given date and time by replaying the daily backup along with the WAL files.

Most major cloud providers support this. I've personally checked the following:

- [AWS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html)
- [DigitalOcean](https://docs.digitalocean.com/products/databases/postgresql/how-to/restore-from-backups/#restore-a-postgresql-cluster-from-backups)
- [OVH](https://www.ovhcloud.com/en-gb/public-cloud/postgresql/)

If your cloud provider of choice doesn't support PITR, then you're getting a substantially worse service.

## Accessing a backup

You might imagine that you can go to the admin page of your cloud provider, and download the database backup. This is rarely how it works. To access a backup, the cloud provider will create a new database server, using the data from the backup. You can then use `pg_dump`, targeting the new database server to download the backup.

## The cost of managed databases

A common criticism of managed databases is you get poor value compared to installing Postgres manually on a virtual machine.

For example, the 5 USD droplet from DigitalOcean has the following specs (as of May 2022):

- 1 vCPU, 1 GB RAM, 25 GB disk

Compared to the managed Postgres service, which starts at 15 USD, and only offers the following specs:

- 1 vCPU, 1 GB RAM, 10 GB storage

Bear in the mind the additional storage that the cloud provider is using for backups. DigitalOcean stores backups for 7 days, which means it could use quite a bit more than 10 GB of storage.

Still, database storage is very expensive compared to block storage and object storage. If your business requires huge amounts of database storage (perhaps you're using [TimescaleDB](https://www.timescale.com/) with terabytes of data) then the cost of a managed database will be substantial.

## Reasons not to use a Postgres service

Besides the cost implications outlined above, there are some other downsides with using Postgres services. For example, there are usually limitations around which Postgres extensions you can install, and you don't have direct control over the many parameters which can be used to tune Postgres performance. If you have very specific requirements, or a particularly demanding application, then managed services might not be the best option.

## Things to look out for

If you've decided to use a managed Postgres service, here are some things to be aware of when comparing cloud providers:

### Storage provided

When cloud providers show the amount of storage available, some are referring to the actual storage available for Postgres, and others are referring to the total capacity of the VM which is running Postgres (by the time you deduct the storage used by the operating system, and other system software, you get less capacity for Postgres itself).

### Hidden costs

Some cloud services include a free bandwidth allowance, and others don't. So doing large queries on your cloud databases from a local machine, and frequently downloading backups, can add to your bill.

### Lack of flexibility

The hyperscale clouds, like AWS, are incredibly flexible with their managed database offerings. You can specify:

- The number of CPU cores
- The amount of RAM
- The amount of storage
- The type of storage - e.g. SSD or hard drive
- How long backups are kept for
- Which availability zones to use for read replicas

The smaller providers are typically far less flexible. For example, DigitalOcean keeps backups for 7 days, and there's no way to configure it to be longer or shorter. In a staging environment, we typically don't need backups, so are paying for something we don't want.

### Automatic Postgres upgrades

Most of the cloud providers allow you to automatically upgrade your Postgres version. This is a huge time saver vs doing it yourself. Often self managed databases will stay on older versions of Postgres for many years due to the pain of upgrading.

### Monitoring

A hosted database is useless unless the cloud provider also provides an effective monitoring solution to let you know when the disk is almost full, and CPU / RAM usage is over a certain threshold.

## Let's look at some providers

All of the following offer a managed Postgres service:

- [Amazon Web Services](https://aws.amazon.com/rds/postgresql/)
- [Azure](https://azure.microsoft.com/en-gb/services/postgresql/#overview)
- [DigitalOcean](https://www.digitalocean.com/products/managed-databases-postgresql)
- [Google Cloud](https://cloud.google.com/sql/docs/postgres)
- [Heroku](https://www.heroku.com/postgres)
- [OVH](https://www.ovhcloud.com/en-gb/public-cloud/postgresql/)
- [UpCloud](https://upcloud.com/products/managed-databases/)

With the following coming soon (as of May 2022):

- [Linode](https://www.linode.com/products/databases/)
- [Vultr](https://www.vultr.com/products/managed-databases/)

There are also some companies who will setup a managed Postgres cluster for you, on one of the hyperscale cloud providers:

- [Aiven](https://aiven.io/postgresql)
- [Crunchy Data](https://www.crunchydata.com/products/crunchy-bridge)
- [ElephantSQL](https://www.elephantsql.com/)
- [EnterpriseDB](https://www.enterprisedb.com/products/biganimal-cloud-postgresql)
- [ScaleGrid](https://scalegrid.io/postgresql.html)

### Which should I use?

Switching cloud providers is a drastic move, so the natural choice is to use your current provider's Postgres service.

But if you're starting from scratch, it mostly depends on your budget and the expected scale of your application.

Without a doubt, the hyperscale providers (in particular AWS) have very complete offerings, with lots of control. But the smaller providers are often cheaper, and easier to understand from a billing perspective.

You might also consider which other services the cloud provider offers. In my case, having an affordable managed Kubernetes service is also important, and weighs into the decision on which cloud provider to pick.

For my personal projects, where I favour ease of use and cost effectiveness, I've had good results from DigitalOcean. When Vultr and Linode release their own managed Postgres services, they will be strong competitors to DigitalOcean's offering in terms of price / performance.

Every medium to large enterprise I've worked with has favoured a hyperscale provider like AWS, Google, or Azure. The range of different services on offer, and familiarity amongst devops professionals, are usually contributing factors.

## Conclusions

Managed Postgres services are a great way to get started, and include essential features like automated backups. You lose some control compared to hosting Postgres yourself, and pay a price premium, but the abundance of new cloud providers offering this service shows there is clearly demand from users.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Creating a new Sphinx theme for our docs]]></title>
            <link>https://piccolo-orm.com/blog/creating-a-new-sphinx-theme-for-our-docs/</link>
            <guid>https://piccolo-orm.com/blog/creating-a-new-sphinx-theme-for-our-docs/</guid>
            <pubDate>Mon, 21 Feb 2022 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Last week we updated the [Piccolo docs](https://piccolo-orm.readthedocs.io/en/latest/index.html) to use our brand new [Sphinx theme](https://github.com/piccolo-orm/piccolo_theme).

All of the Piccolo projects will be updated to use this theme in the near future. You can also use it for your own projects.

Previously, we used the [Read the Docs theme](https://sphinx-rtd-theme.readthedocs.io/en/stable/). I really like this theme, but there's a few things I wanted to change (just my personal taste).

## Previous Design

### Autodoc

When you embed code in Sphinx using [autodoc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html), it gets a bit messy (especially if you have type annotations).

![Code snippets](/images/blog/documentation-theme/code_snippets.png)

### Typography and white space

I generally like the typography, with the bold headings. I personally prefer sans-serif header fonts though.

The theme has some grey unused space on the right, which could be utilised for something.

![Dead space](/images/blog/documentation-theme/dead_space.png)

### Next buttons

The next button doesn't tell you the title of the next page.

![Next buttons](/images/blog/documentation-theme/next_buttons.png)

### Sidebar

Everything goes in a single sidebar, which can get a bit overwhelming.

![Sidebar](/images/blog/documentation-theme/sidebar.png)

## New design

In the new design we:

- Have a left and right sidebar. The left sidebar shows the page hierarchy, and right sidebar shows the contents of the current page.
- Use sans-serif header fonts.
- Have a header bar, to add a splash of colour.
- The next / previous buttons show the title of the adjacent page.

![New Design](/images/blog/documentation-theme/new_design.png)

We also modified the autodoc output so it's more legible (each argument is on its own line).

![New Design Code Blocks](/images/blog/documentation-theme/new_design_code_blocks.png)

## Did we succeed?

Changing the docs for a project is high risk - people get used to a certain look and feel. Hopefully the community will like the changes!

If you have any feedback on the new design, please create a new [discussion](https://github.com/piccolo-orm/piccolo_theme/discussions) or [issue](https://github.com/piccolo-orm/piccolo_theme/issues).

## Why choose Sphinx?

Sphinx is a very powerful tool, and an incredible asset for the Python community. With [intersphinx](https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html) you can link between Sphinx projects. So in Piccolo, we can link directly to the main Python docs (because they're also written using Sphinx).

Sphinx is also very actively maintained, with new features coming out regularly.

By creating this new theme, we obviously benefit ourselves as Piccolo maintainers and users, but it's also our way of giving back to the Sphinx community.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The power of Python descriptors]]></title>
            <link>https://piccolo-orm.com/blog/the-power-of-python-descriptors/</link>
            <guid>https://piccolo-orm.com/blog/the-power-of-python-descriptors/</guid>
            <pubDate>Mon, 24 Jan 2022 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
There are two features in Python which aren't often needed in everyday programming, but are essential to the inner workings of Piccolo. The first is metaclasses, and the second is the [Python descriptor protocol](https://docs.python.org/3/howto/descriptor.html).

In this article we'll look at the Python descriptor protocol, and why it's so powerful. In fact, it underpins many core Python features such as classmethods.

## What is the descriptor protocol?

The descriptor protocol allows us to implement custom logic when a variable is accessed, or assigned a new value. For example:

```python
class Parent:
    child = Child()

parent = Parent()

# With the descriptor protocol we can run custom logic:

Parent.child       # when it's accessed on the class
parent.child       # when it's accessed on a class instance
parent.child = 1   # when we assign a new value to it
```

Like many things in Python, it's implemented using magic methods. In this case `__get__` and `__set__`:

```python
class Child:
    def __get__(self, obj, objtype=None):
        print("I was accessed")

    def __set__(self, obj, value):
        print("I was assigned a new value")

```

Which gives us the following:

```python
parent = Parent()

parent.child
>>> I was accessed

parent.child = 1
>>> I was a assigned a new value
```

There are lots of interesting use cases. When a value is assigned we could:

- Store it in an external database.
- Invalidate a cache.
- Refresh some UI (it's not too dissimilar to how reactivity is handled in Vue JS).

When a value is read we could:

- Calculate the value dynamically.
- Fetch the value from an external source.
- Log the value.

## Context

What makes the descriptor protocol extra interesting is the `obj` argument which is provided to the `__get__` and `__set__` methods.

The `obj` argument is either `None` or a class instance.

- When `obj` is `None`, then the the attribute was accessed on a class (i.e. `Parent.child`).
- When `obj` is a class instance, the attribute was accessed on that instance (i.e. `Parent().child`).

We're able to customise the behaviour depending on where it was called from. A trivial example:

```python
class Child:
    def __get__(self, obj, objtype=None):
        if obj is None:
            print("I was accessed from a class.")
        else:
            print("I was accessed from a class instance.")
```

In an ORM like Piccolo, having this information is incredibly value.

In the example below, the `name` attribute represents the column type:

```python
class Band(Table):
    name = Varchar()
```

But when we do a database query, the name attribute returns the value in the database instead.

```python
band: Band = await Band.objects().first()
>>> band.name
'Pythonistas'
>>> type(band.name)
str
```

Being able to have correct type annotations was a huge head scratcher - how do you have correct type annotations for an attribute which is context dependent?

It turns out we can do this using descriptors:

```python
class Varchar(Column):
    ...

    @typing.overload
    def __get__(self, obj: Table, objtype=None) -> str:
        ...

    @typing.overload
    def __get__(self, obj: None, objtype=None) -> Varchar:
        ...

    def __get__(self, obj, objtype=None):
        # This is Piccolo specific:
        return obj.__dict__[self._meta.name] if obj else self
```

[MyPy](https://mypy.readthedocs.io/en/stable/) now knows when the `name` is a `Varchar`, and when it's a `str`.

## Conclusions

This just scratches the surface of descriptors. As mentioned in the intro, they're not needed every day, but they help us solve really tricky problems, and unlock some interesting design space for Python libraries.

## Resources

- [An official guide on python.org](https://docs.python.org/3/howto/descriptor.html)
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Managing your data using FastAPI and Piccolo Admin]]></title>
            <link>https://piccolo-orm.com/blog/managing-your-data-using-fast-api-and-piccolo-admin/</link>
            <guid>https://piccolo-orm.com/blog/managing-your-data-using-fast-api-and-piccolo-admin/</guid>
            <pubDate>Sat, 08 Jan 2022 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Our talk from [PyData Global](https://pydata.org/global2021/schedule/presentation/143/managing-your-data-using-fastapi-and-piccolo-admin/) is now available:

<iframe width="735" height="400" src="https://www.youtube.com/embed/aRmrXf_zOfQ?start=53" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

We give an overview of [Piccolo](https://github.com/piccolo-orm/piccolo) and [FastAPI](https://piccolo-api.readthedocs.io/en/latest/fastapi/index.html), and show how you can create a movie database, with an API and admin interface in record time.

The [slides are available to download](https://github.com/piccolo-orm/pymdb/raw/master/pydata_global_presentation.pdf).

The [code used in the demo is also available](https://github.com/piccolo-orm/pymdb/).
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Many-to-Many relationships]]></title>
            <link>https://piccolo-orm.com/blog/many-to-many-relationships/</link>
            <guid>https://piccolo-orm.com/blog/many-to-many-relationships/</guid>
            <pubDate>Mon, 20 Dec 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
<iframe width="735" height="400" src="https://www.youtube.com/embed/J9YFt8Hxm4I" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

Piccolo has a new API for [Many-To-Many relationships](https://piccolo-orm.readthedocs.io/en/latest/piccolo/schema/m2m.html).

We put a lot of work into making it powerful and user friendly.

Take this schema as an example, where you have bands, and they belong to musical genres:

```python
from piccolo.columns.column_types import (
    ForeignKey,
    LazyTableReference,
    Varchar
)
from piccolo.columns.m2m import M2M
from piccolo.table import Table


class Band(Table):
    name = Varchar()
    genres = M2M(LazyTableReference("GenreToBand", module_path=__name__))


class Genre(Table):
    name = Varchar()
    bands = M2M(LazyTableReference("GenreToBand", module_path=__name__))


# This is our joining table:
class GenreToBand(Table):
    band = ForeignKey(Band)
    genre = ForeignKey(Genre)
```

We can do all kinds of awesome queries:

```python
>>> await Band.select(Band.name, Band.genres(Genre.name, as_list=True))
[
    {
        "name": "Pythonistas",
        "genres": ["Rock", "Folk"]
    },
    ...
]
```

To get the results as dictionaries:

```python
>>> await Band.select(Band.name, Band.genres(Genre.id, Genre.name))
[
    {
        "name": "Pythonistas",
        "genres": [
            {"id": 1, "name": "Rock"},
            {"id": 2, "name": "Folk"}
        ]
    },
    ...
]
```

We can also use it in reverse, to get all bands which belong to a given genre.

```python
>>> await Genre.select(Genre.name, Genre.bands(Band.name, as_list=True))
[
    {
        "name": "Rock",
        "bands": ["Pythonistas", "C-Sharps"]
    },
    ...
]
```

There are lots of other powerful features - [see the docs](https://piccolo-orm.readthedocs.io/en/latest/piccolo/schema/m2m.html) for more information.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Building a great select widget]]></title>
            <link>https://piccolo-orm.com/blog/building-a-great-select-widget/</link>
            <guid>https://piccolo-orm.com/blog/building-a-great-select-widget/</guid>
            <pubDate>Fri, 10 Dec 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
<iframe width="735" height="400" src="https://www.youtube.com/embed/h819CBKIqKI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

A [`select`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/select) widget is easy right? We just do this in HTML:

```html
<select name="director">
  <option value="george">George Lucas</option>
  <option value="peter">Peter Jackson</option>
  <option value="steven">Steven Spielberg</option>
</select>
```

Which gives us:

<select name="director">
  <option value="george">George Lucas</option>
  <option value="peter">Peter Jackson</option>
  <option value="steven">Steven Spielberg</option>
</select>

This works well when there are only a few options. However, when there are lots of options it causes some major issues:

1. The user experience becomes quite poor because the user has to scroll through lots of options looking for the right one. We could use a [`datalist`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/datalist) instead, which is searchable. However, it doesn't solve our second problem.
1. If the number of options is really high it can actually crash the web browser. This is a possibility when pulling all of the options from an API and creating the widget with Javascript. In [Piccolo Admin](https://github.com/piccolo-orm/piccolo_admin) we encountered this issue, because depending on the database table there could be millions of options.

The obvious solution is to add a [`search`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/search) field instead:

```html
<input type="search" placeholder="Search" />
```

Which gives us:

<input type="search" placeholder="Search" />

Now the user can just search for what they want, and there's no risk of crashing the web browser with lots of data.

But even this isn't perfect. With a search field you assume that the user knows what they're searching for. And it's a far worse user experience if there are only a few options available - picking an option from a select widget is less effort than searching in this scenario.

We need a widget which works for all cases. If there are only a few options, it should be convenient, and shouldn't require the user to search. But when there are lots of options, we allow the user to search, and it shouldn't crash the web browser.

After much experimentation, we came up with this hybrid widget:

<figure>
<a href="#" class="lightbox">
    <img src="https://piccolo-orm.com/images/blog/building-a-great-select-widget/hybrid_select_widget.jpeg" alt="Hybrid select widget" />
</a>
<figcaption>The hybrid select widget</figcaption>
</figure>

We render a search widget, and when the user clicks on it, we open the hybrid widget in a popup.

<figure>
<a href="#" class="lightbox">
    <img src="https://piccolo-orm.com/images/blog/building-a-great-select-widget/hybrid_select_widget.gif" alt="Hybrid select widget demo" />
</a>
<figcaption>Demo</figcaption>
</figure>

The hybrid select widget preloads the first 5 results. This means that for situations where there aren't many options (for example gender) the user immediately clicks on the one they want, and job done.

If they want to see a few more options, they click on load more. And finally, they can just search.

In this way we're able to have a widget which scales well - whether there's two options, or two million options.

The widget is written in [Vue JS](https://vuejs.org/) - the [source code is on GitHub](https://github.com/piccolo-orm/piccolo_admin/blob/master/admin_ui/src/components/KeySearchModal.vue). It's available for [Piccolo Admin](https://github.com/piccolo-orm/piccolo_admin) users from version 0.19.1 onwards.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Replicating GraphQL using REST, Piccolo, and FastAPI]]></title>
            <link>https://piccolo-orm.com/blog/replicating-graph-ql-using-rest-piccolo-and-fast-api/</link>
            <guid>https://piccolo-orm.com/blog/replicating-graph-ql-using-rest-piccolo-and-fast-api/</guid>
            <pubDate>Fri, 03 Dec 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
<iframe width="726" height="400" src="https://www.youtube.com/embed/OUvWn0GUDSI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

[GraphQL](https://graphql.org/) is a really powerful approach to building APIs. It allows clients to specify exactly what data they want. Contrast this with most [REST APIs](https://en.wikipedia.org/wiki/Representational_state_transfer), where a given endpoint typically returns the data in the same structure each time.

The advantages of being able to request exactly the data we need are:

- Less data needs to be transferred over the network.
- By giving the client more flexibility, it's less likely a backend engineer will have to make arbitrary changes to the API (for example, adding / removing fields).
- Potentially less load on the API server, if it only has to return what's needed, and not additional data.

The downside though is GraphQL is quite a big investment in terms of setup and learning. Also, a lot of companies already have REST APIs. What if we can replicate some of the advantages of GraphQL using REST? Enter [Piccolo](https://github.com/piccolo-orm/piccolo) and [FastAPI](https://github.com/tiangolo/fastapi).

## PiccoloCRUD

Piccolo has a class called [PiccoloCRUD](https://piccolo-api.readthedocs.io/en/latest/crud/piccolo_crud.html) which basically makes a super endpoint from a Piccolo table. As the name suggests, it supports all of the CRUD operations, and some really powerful filtering.

We recently made some big improvements - namely, being able to request specific fields, and even doing joins. It also integrates seamlessly with [FastAPI](https://piccolo-api.readthedocs.io/en/latest/fastapi/index.html), so the endpoint has automatic Swagger docs.

It's as simple as this:

```python
from fastapi import FastAPI
from piccolo_api.crud.endpoints import PiccoloCRUD
from piccolo_api.fastapi.endpoints import FastAPIWrapper

from movies.tables import Movie


app = FastAPI()


FastAPIWrapper(
    "/movies/",
    app,
    PiccoloCRUD(Movie, read_only=True, exclude_secrets=True, max_joins=1),
)

```

Here is the schema we're using:

```python
from piccolo.columns import (
    ForeignKey,
    Integer,
    Real,
    Varchar,
)
from piccolo.table import Table


class Director(Table):
    name = Varchar(length=300, null=False)
    net_worth = Integer(secret=True, help_text="In millions")


class Movie(Table):
    name = Varchar(length=300)
    rating = Real(help_text="The rating on IMDB.")
    director = ForeignKey(references=Director)

```

You can get the [entire source code on GitHub](https://github.com/piccolo-orm/piccolo_videos/blob/main/making_a_powerful_rest_api_like_graphql/).

## Trying it out

If we query the endpoint, we get a response like:

```json
GET /movies/

{
    "rows": [
        {
            "id": 1
            "name": "Star Wars: A New Hope",
            "rating": 8.6,
            "director": 1
        }
    ]
}
```

Now let's try fetching a subset of fields, using the `__visible_fields` parameter:

```json
GET /movies/?__visible_fields=name,director.name

{
    "rows": [
        {
            "name": "Star Wars: A New Hope",
            "director": {
                "name": "George Lucas"
            }
        }
    ]
}
```

Note how we got a nested object when we specified `director.name` as a field name, as it belongs to a related table. Piccolo performs the necessary joins under the hood.

You can also try this out via FastAPI's Swagger docs:

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/replicating-graphql-using-rest-and-piccolo/swagger_docs.jpg" alt="Swagger docs" />
</a>
<figcaption>All of the filters are visible in the Swagger docs</figcaption>
</figure>

## Security

You'll notice in the table definition that we designated the `Director.net_worth` column as `secret=True`. What this means is the value returned by the API is only ever `null`.

It means we can shield sensitive information from clients if we want to.

We can also limit the number of joins which are allowed, using the `max_joins` parameter on `PiccoloCRUD`. This prevents clients from doing queries which are overly complex, and would potentially slow down our API.

## Conclusions

I hope this illustrates how powerful `PiccoloCRUD` is, and how we're able to build something with very little code which approximates what GraphQL can do.

It's a great way of rapidly building an API, which could save on some bandwidth too!
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Making open source accessible]]></title>
            <link>https://piccolo-orm.com/blog/making-open-source-accessible/</link>
            <guid>https://piccolo-orm.com/blog/making-open-source-accessible/</guid>
            <pubDate>Wed, 13 Oct 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
I'm been dabbling in open source for most of my professional career, but only in the last year or so would I consider myself a 'maintainer'. By which I mean there's a project I care deeply about, and want to create and sustain a community around (hint - it's Piccolo!).

I've made a bunch of mistakes along the way, and am still learning. But I've realised some things about making a project friendlier to contributors, which is what this article will be about.

## Make a useful project

A prerequisite is having a project which is useful, and well documented. If someone isn't intrigued by your project as a user, they are very unlikely to want to contribute. So get the basics in place for your users first.

## Contributing docs

There needs to be somewhere that contributors can read about what's expected. This can be a `CONTRIBUTING.md` file in your GitHub repo, or a section in your docs.

I recently put a lot of effort into updating the `CONTRIBUTING.md` file for Piccolo. You can see it [here](https://github.com/piccolo-orm/piccolo/blob/master/CONTRIBUTING.md). It covers many aspects of contributing, such as:

- Which tasks are most appropriate to newcomers
- What makes a good pull request
- How soon to expect a code review

## Create tasks for new contributors

In GitHub you can add tags to your tasks. There's one called `good first issue`. It's worth putting in some effort, and creating tasks especially for newcomers.

These tasks should be nicely self contained, and not require extensive knowledge of the code base. However, they should also be meaningful, and not just tedious admin tasks.

## Be helpful

Creating a pull request can be daunting. Let's be honest, Git isn't the easiest thing to master, and learning a new codebase can be intimidating. Be as helpful as possible to new contributors.

## Spread the word

There are programs such as [Hacktoberfest](https://hacktoberfest.digitalocean.com/), and [Google Summer of Code](https://summerofcode.withgoogle.com/). Hacktoberfest is very easy to get involved with (you just add the `hacktoberfest` label to your project). Google Summer of Code is much more involved.

## Resources

- [How to Contribute to and Maintain Open Source Projects with GitHub - video](https://www.youtube.com/watch?v=vSdSFxIKy5w)
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Python's graphlib is awesome]]></title>
            <link>https://piccolo-orm.com/blog/python-s-graphlib-is-awesome/</link>
            <guid>https://piccolo-orm.com/blog/python-s-graphlib-is-awesome/</guid>
            <pubDate>Tue, 12 Oct 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
The [`graphlib` module](https://docs.python.org/3/library/graphlib.html) was added in Python 3.9, and it's a great addition to the standard library. Piccolo [uses it a lot](https://github.com/piccolo-orm/piccolo/blob/14af797c4f613b4490fad3942b73a69dde512a88/piccolo/table.py#L1051).

As the name suggests, `graphlib` is used for sorting data which is in a graph-like structure.

For example, in Piccolo we often need to sort tables based on their foreign key columns.

Take this schema as an example:

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/pythons-graphlib-is-awesome/schema_graph_small.png" alt="Example database schema" />
</a>
<figcaption>A simple database schema</figcaption>
</figure>

When creating the tables, we need to make sure that the `Manager` table is created before the `Band` table, as there's a foreign key from `Band` to `Manager`. In the parlance of graphs, each table is a node, and each foreign key is an edge.

You might think, OK - let's just use Python's built-in [`sorted` function](https://docs.python.org/3/library/functions.html#sorted) to determine the correct order. For complex graphs, with multiple nodes and edges, the sort function just doesn't work.

The `sorted` function works in situations like this:

```python
>>> sorted([1,3,2,5,4])
[1,2,3,4,5]
```

When sorting more complex types, you can pass a `key` argument to `sorted`, telling it how to compare the various elements. But when each element in the list has complex relationships to other elements in the list, the output won't be what you expect.

Thankfully `graphlib` comes to the rescue. Tools similar to `graphlib` have existed for a long time (for example [NetworkX](https://networkx.org/)), but having something in the standard library which solves common use cases is very welcome.

All we have to do to sort the above schema is this:

```python
from graphlib import TopologicalSorter

# The graph is a dictionary mapping nodes to a set of connected nodes.
graph = {'band': {'manager'}, 'manager': set()}
sorter = TopologicalSorter(graph)
ordered = tuple(sorter.static_order())
>>> print(ordered)
('manager', 'band')
```

That was a trivial example, here's a slightly more complex schema:

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/pythons-graphlib-is-awesome/schema_graph.png" alt="Example database schema" />
</a>
<figcaption>A slightly more complex database schema</figcaption>
</figure>

```python
from graphlib import TopologicalSorter

graph = {
    'band': {'manager'},
    'manager': set(),
    'concert': {'band', 'venue'},
    'venue': set(),
}
sorter = TopologicalSorter(graph)
ordered = tuple(sorter.static_order())
>>> print(ordered)
('manager', 'venue', 'band', 'concert')
```

I encourage you to check `graphlib` out - it's really useful, and quite fun.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[PostgreSQL 14 released]]></title>
            <link>https://piccolo-orm.com/blog/postgre-sql-14-released/</link>
            <guid>https://piccolo-orm.com/blog/postgre-sql-14-released/</guid>
            <pubDate>Mon, 11 Oct 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
PostgreSQL 14 has [now been released](https://www.postgresql.org/about/news/postgresql-14-released-2318/).

Piccolo has been tested with it, and everything works smoothly.

It looks like another great release from the PostgreSQL team, with usability and performance improvements.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Easy Forms using Pydantic and Piccolo Admin]]></title>
            <link>https://piccolo-orm.com/blog/easy-forms-using-pydantic-and-piccolo-admin/</link>
            <guid>https://piccolo-orm.com/blog/easy-forms-using-pydantic-and-piccolo-admin/</guid>
            <pubDate>Thu, 23 Sep 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
We recently added an exciting feature to Piccolo Admin, which lets you [build a form based on a Pydantic model](https://piccolo-admin.readthedocs.io/en/latest/custom_forms/index.html).

<iframe width="735" height="400" src="https://www.youtube.com/embed/xGdZDmGMkaU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

It makes Piccolo Admin a great platform for building internal tools and business apps. It doesn't require any knowledge of HTML, CSS or Javascript.

This is what it looks like:

<figure>
    <a href="#" class="lightbox">
        <img src="https://piccolo-orm.com/images/blog/easy-forms-using-pydantic-and-piccolo-admin/screenshot_sidebar.jpg" alt="Piccolo Admin screenshot" />
    </a>
    <figcaption>Forms are accessible in the sidebar.</figcaption>
</figure>

<figure>
    <a href="#" class="lightbox">
        <img src="https://piccolo-orm.com/images/blog/easy-forms-using-pydantic-and-piccolo-admin/screenshot_form.jpg" alt="Piccolo Admin screenshot" />
    </a>
    <figcaption>An example of a form.</figcaption>
</figure>

Here is the `app.py` file:

```python
# app.py
from piccolo_admin.endpoints import create_admin, FormConfig
from fastapi import FastAPI
from starlette.requests import Request
from pydantic import BaseModel, validator


app = FastAPI()


################################################################################


class Order(BaseModel):
    item_name: str
    quantity: int
    customer_name: str

    @validator("quantity")
    def validate_quantity(cls, value):
        if value < 1:
            raise ValueError("You must order at least 1 item!")
        return value


def order_handler(request: Request, model: Order):
    print(
        f"I just got an order from {model.customer_name} for "
        f"{model.quantity} x {model.item_name}"
    )
    return "Processed order"


order_form = FormConfig(
    name="Order Form",
    pydantic_model=Order,
    endpoint=order_handler,
)

################################################################################

app.mount(
    "/",
    create_admin(
        site_name="MyShop.com",
        tables=[],
        forms=[order_form],
    ),
)
```

And `piccolo_conf.py`:

```python
# piccolo_conf.
from piccolo.conf.apps import AppRegistry
from piccolo.engine.postgres import PostgresEngine


DB = PostgresEngine(config={"database": "form_demo", "user": "postgres"})


# A list of paths to piccolo apps
# e.g. ['blog.piccolo_app']
APP_REGISTRY = AppRegistry(apps=["piccolo_admin.piccolo_app"])
```

To run the app:

- Make sure the database exists.
- Install the requirements - `pip install piccolo[all] piccolo_admin`
- Run all migrations - `piccolo migrations forwards all`
- Create a user to login with - `piccolo user create`
- Start the app - `uvicorn app:app`
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Piccolo for Data Science Scripts]]></title>
            <link>https://piccolo-orm.com/blog/piccolo-for-data-science-scripts/</link>
            <guid>https://piccolo-orm.com/blog/piccolo-for-data-science-scripts/</guid>
            <pubDate>Fri, 17 Sep 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
We recently added the `create_tables` function to Piccolo ([docs](https://piccolo-orm.readthedocs.io/en/latest/piccolo/query_types/create_table.html)).

The idea was to make it easier for people writing simple data science scripts, who don't need more advanced features like migrations.

Here's a new [tutorial video](https://www.youtube.com/watch?v=yBGgK09H5rI) about it. And the example code from the video, which fetches data from [Open Weather Map](https://openweathermap.org/api), and loads it into SQLite:

```python
import asyncio
import decimal
import os

import httpx
from piccolo.columns.column_types import (
    Varchar,
    Numeric,
    ForeignKey,
    Timestamp,
)
from piccolo.engine.sqlite import SQLiteEngine
from piccolo.table import Table, create_tables
import dotenv


DB = SQLiteEngine("weather.sqlite")


class City(Table, db=DB):
    name = Varchar()


class WeatherData(Table, db=DB):
    city = ForeignKey(City)
    temp = Numeric()
    fetched_at = Timestamp()


dotenv.load_dotenv()
API_KEY = os.environ.get("API_KEY")


async def get_data(city_name: str, client: httpx.AsyncClient):
    """
    Fetch weather data from the Open Weather Map API.
    """
    url = (
        "https://api.openweathermap.org/data/2.5/weather"
        f"?q={city_name}&appid={API_KEY}"
    )
    response = await client.get(url)
    data = response.json()
    city = await City.objects().get_or_create(City.name == city_name).run()
    weather_data = WeatherData(
        temp=decimal.Decimal(data["main"]["temp"]), city=city
    )
    await weather_data.save().run()


async def main():
    create_tables(City, WeatherData, if_not_exists=True)

    async with httpx.AsyncClient() as client:
        await asyncio.gather(
            *[
                get_data(city_name=city_name, client=client)
                for city_name in ["London", "New York", "Paris"]
            ]
        )

    print("Loaded weather data")


if __name__ == "__main__":
    asyncio.run(main())
```
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Talk Python To Me podcast]]></title>
            <link>https://piccolo-orm.com/blog/talk-python-to-me-podcast/</link>
            <guid>https://piccolo-orm.com/blog/talk-python-to-me-podcast/</guid>
            <pubDate>Sun, 08 Aug 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Piccolo was on the [Talk Python To Me](https://talkpython.fm/) podcast. We discussed the origins of Piccolo, and some of it's main features.

You can [see the video here](https://www.youtube.com/watch?v=d3WF59IO3S0).
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[BlackSheep]]></title>
            <link>https://piccolo-orm.com/blog/black-sheep/</link>
            <guid>https://piccolo-orm.com/blog/black-sheep/</guid>
            <pubDate>Thu, 10 Jun 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Piccolo supports several [ASGI](/blog/introduction-to-asgi/) web frameworks out of the box - just use the `piccolo asgi new` command and it will create you a new web app ([see the docs](https://piccolo-orm.readthedocs.io/en/latest/piccolo/asgi/index.html)).

A Piccolo user recently asked for help integrating with [BlackSheep](https://www.neoteroi.dev/blacksheep/), which is a promising looking ASGI web framework. I decided to add support for it within Piccolo, so it's now an option when using `piccolo asgi new`.

<figure>
    <img src="https://piccolo-orm.com/images/blog/blacksheep/blacksheep_logo.png" class="medium" />
</figure>

Some interesting features of BlackSheep are:

 * **OpenAPI support** -  BlackSheep can [automatically create OpenAPI docs](https://www.neoteroi.dev/blacksheep/openapi/) from the type annotations of your endpoints, similar to FastAPI.
 * **Performance** - some of the BlackSheep internals are implemented in Cython, which should help deliver good performance.
 * **Flexible design** - endpoints can be class based or function based.

Check out the [docs](https://www.neoteroi.dev/blacksheep/) for more details.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Frozen queries]]></title>
            <link>https://piccolo-orm.com/blog/frozen-queries/</link>
            <guid>https://piccolo-orm.com/blog/frozen-queries/</guid>
            <pubDate>Thu, 10 Jun 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
<img src="https://piccolo-orm.com/images/blog/frozen-queries/sql.jpg" />

A feature which was recently added to Piccolo is [frozen queries](https://piccolo-orm.readthedocs.io/en/latest/piccolo/query_clauses/freeze.html).


The purpose of an ORM / query builder like Piccolo is pretty simple - it converts a query defined in Python code into SQL. This process takes a little bit of time, which makes queries slightly slower than writing SQL by hand.

In a typical web application, you'll usually have some queries which are run over and over again. Converting those queries into SQL each time they are run is a little bit wasteful.

To tackle this, Piccolo queries can now be 'frozen'. This precalculates the SQL, so it only has to be calculated once, irrespective of how many times the query is run. Once a query is frozen, you can't apply any more clauses to it (`where`, `order_by` etc), as this would cause the SQL to be different.

Here's an example:

```python
LATEST_ARTICLES = Article.select(
    Article.id,
    Article.title
).order_by(
    Article.published_on,
    ascending=False
).limit(
    10
).output(
    as_json=True
).freeze()

# In the corresponding view/endpoint of whichever web framework
# you're using:
async def latest_articles(self, request):
    return await LATEST_ARTICLES.run()
```

Bear in mind that most of the time spent running a query is waiting for a response from the database, so end users won't notice much difference if your queries are frozen or not.  But for apps which require high throughput, every little helps, and it makes sense to use frozen queries where possible.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Piccolo column choices]]></title>
            <link>https://piccolo-orm.com/blog/piccolo-column-choices/</link>
            <guid>https://piccolo-orm.com/blog/piccolo-column-choices/</guid>
            <pubDate>Sat, 29 May 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
A new feature was recently added to Piccolo, which allows [choices to be specified for columns](https://piccolo-orm.readthedocs.io/en/latest/piccolo/schema/advanced.html#choices). It leverages Python's [Enum](https://docs.python.org/3/library/enum.html) support. Here's an example:

```python
from enum import Enum

from piccolo.columns import Varchar
from piccolo.table import Table


class Director(Table):
    class Gender(str, Enum):
        male = 'm'
        female = 'f'
        non_binary = 'n'

    name = Varchar(length=100)
    gender = Varchar(length=1, choices=Gender)

```

You can now do queries like this:

```python
>>> Director.select().where(
>>>     Director.gender == Director.Gender.male
>>> ).run_sync()
[{'id': 1, 'name': 'George Lucas', 'gender': 'm'}, ...]

>>> director = Director(
>>>     name="Brenda Barton",
>>>     gender=Director.Gender.female
>>> )
>>> director.save().run_sync()
```

[Piccolo Admin](/ecosystem/) also supports this feature. When a column has choices specified, a select widget is rendered in the UI.

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/column-choices/column-choices-ui.png" alt="Column choices UI" />
</a>
</figure>
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Understanding sys.exit]]></title>
            <link>https://piccolo-orm.com/blog/understanding-sys-exit/</link>
            <guid>https://piccolo-orm.com/blog/understanding-sys-exit/</guid>
            <pubDate>Thu, 22 Apr 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Some of the Piccolo commands use ``sys.exit`` to indicate to the user whether the code ran successfully or not.

Here's a very simple example:

```python
import sys


def my_command():
    was_successful = do_something()
    if was_successful:
        sys.exit(0)
    else:
        sys.exit(1)
```

The number passed to `sys.exit` is the **exit code**. In Unix, an exit code of `1` means something went wrong. An exit code of `0` means it was successful.

## Using exit codes

On the command line, you can see the exit code of the last command using `echo $?`.

```bash
>>> python successful_script.py
>>> echo $?
0

>>> python failing_script.py
>>> echo $?
1
```

You can then use these exit codes in things like `if` statements.

```bash
>>> if python successful_script.py; then echo "successful"; else echo "error"; fi;
successful

>>> if python failing_script.py; then echo "successful"; else echo "error"; fi;
error
```

Also, exit codes are very important in build tools like Docker. If the exit code indicates a command has failed, then the build will fail.

## Exit messages

If you want to indicate why the failure occured, then a string can be passed to `sys.exit`, in which case the exit code is treated as `1` (i.e. a failure), and the string is printed out.

```python
# failing_script.py
import sys

sys.exit("Something bad happened")
```

Let's try it:

```bash
>>> python failing_script.py
Something bad happened
```

## How does sys.exit work?

When calling `sys.exit` it actually just raises an exception. The exception is `SystemExit`. It's unusual for a codebase to catch `SystemExit` exceptions - but it can be done.

```python
# refuse.py
import sys

try:
    sys.exit(1)
except SystemExit:
    print("I refuse!")

```

If we call it:

```bash
>>> python refuse.py
I refuse!
>>> echo $?
0
```

## Are there other exit codes?

In 99% of situations, `0` and `1` are sufficient as exit codes. There are [others](https://tldp.org/LDP/abs/html/exitcodes.html) though, but using them is rare.

```python
import sys

sys.exit(127)  # 127 means 'command not found'
```

## Why not just raise exceptions instead?

Rather than using `sys.exit`, you can just raise an exception.

```python
# exception_script.py
raise Exception('Something went wrong')
```

If this exception in unhandled, and causes the program to crash, the exit code will be `1`, and a traceback will be printed out:

```bash
>>> python exception_script.py
Traceback (most recent call last):
  File "exception_script.py", line 1, in <module>
    raise Exception("Something went wrong")
Exception: Something went wrong
```

Having more verbose output may be useful for debugging purposes. However, you don't necessarily want this level of information being shown to a user if it's a known exception.

By using `sys.exit` you can exit the program, and just show a message without a traceback.

Also, by using `sys.exit`, it indicates clearly within your code that the intention is to stop the program, vs an exception, which is often meant to be handled.

## Conclusions

So, should you use `sys.exit`? In summary, here are some situations where it is useful:

  * If you're writing code which will be consumed on the command line, and you want to exit the program without showing a traceback.
  * If you want to return an exit code other than 0 or 1.
  * To indicate clearly within your code that the intention is to stop the program.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Postgres - one database to rule them all]]></title>
            <link>https://piccolo-orm.com/blog/postgres-one-database-to-rule-them-all/</link>
            <guid>https://piccolo-orm.com/blog/postgres-one-database-to-rule-them-all/</guid>
            <pubDate>Thu, 08 Apr 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
<img src="https://piccolo-orm.com/images/blog/postgres_one_database/one_ring.jpg" alt="One ring" />

One of the advantages of Postgres is the many high quality extensions which makes it suitable for storing a range of data.

You can store:

 * JSON - [builtin](https://www.postgresql.org/docs/current/functions-json.html)
 * Spatial data - [PostGIS](https://postgis.net/)
 * Time series data - [TimescaleDB](https://www.timescale.com/)
 * Even graph data is in the works - [Apache AGE](https://age.apache.org/)

Without these extensions you would require many different specialised databases. Supporting multiple databases means more maintenance work.

By having all of your data in one place, it makes querying it easier. With a single SQL query you can join together time series data, spatial data, and any other relational data. If this data is spread over many databases, you need to join the data together using code, which is less performant, and more work.

There is also great convenience in only needing to learn one query language - SQL. Many specialist databases have their own query language, which take significant time and effort to learn. It's likely that a large proportion of developers on a team know at least some SQL.

By having all of your data in Postgres, it also means you can use the tools you're familiar with. If your time series data is in a separate database, it's another set of tools you need to learn. Postgres has a huge ecosystem of tools - GUIs such as [PgAdmin](https://www.pgadmin.org/), drivers for most programming languages, and ORMs / query builders like [Piccolo](https://piccolo-orm.com/)!

The extensibility of Postgres is really its killer feature. It's almost an operating system for your data. The fact that it can do so much, and does it so well, is remarkable. This is one of the reasons it's growing so quickly.

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/postgres_one_database/database_rankings.png" alt="Database rankings" />
</a>
<figcaption>Source: <a href="https://db-engines.com/en/ranking">db-engines.com</a></figcaption>
</figure>

When I'm building systems, I argue hard to use Postgres for as much as possible. It results in a more streamlined architecture, with greater developer productivity, and easier onboarding. Postgres was initially released in 1996, but it feels like it's just getting started.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Piccolo Admin tooltips]]></title>
            <link>https://piccolo-orm.com/blog/piccolo-admin-tooltips/</link>
            <guid>https://piccolo-orm.com/blog/piccolo-admin-tooltips/</guid>
            <pubDate>Tue, 23 Mar 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[

The [Piccolo Admin](https://github.com/piccolo-orm/piccolo_admin) now has support for tooltips, which can be added for tables and columns.

Here's are some screenshots:

<figure>
<a href="#" class="lightbox">
    <img src="https://piccolo-orm.com/images/blog/piccolo-admin-tooltips/column_tooltip.png" class="small" alt="column tooltip" />
</a>
</figure>

<figure>
<a href="#" class="lightbox">
    <img src="https://piccolo-orm.com/images/blog/piccolo-admin-tooltips/table_tooltip.png" class="small" alt="table tooltip" />
</a>
</figure>

Often database tables get quite complex, and providing hints to the user about what a table is for, and what the columns represent, can be very helpful.

Adding them is very simple:

```python
# tables.py
from piccolo.columns import Varchar
from piccolo.table import Table


class Movie(Table, help_text="Movies which were released in cinemas."):
    name = Varchar(help_text="The name it was released under in the USA.")

```

And then we just run the admin as usual:

```python
# app.py
from piccolo_admin.endpoints import create_admin

from tables import Movie


app = create_admin(tables=[Movie])


if __name__ == '__main__':
     import uvicorn
     uvicorn.run(app)
```
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Building an admin to handle millions of rows]]></title>
            <link>https://piccolo-orm.com/blog/building-an-admin-to-handle-millions-of-rows/</link>
            <guid>https://piccolo-orm.com/blog/building-an-admin-to-handle-millions-of-rows/</guid>
            <pubDate>Sun, 14 Mar 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Many of the recent changes to [Piccolo Admin](https://github.com/piccolo-orm/piccolo_admin) have been about improving performance and usability when dealing with large database tables.

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/admin-millions-of-rows/admin_screenshot.png" alt="Piccolo Admin screenshot" />
</a>
<figcaption>The Piccolo Admin, in dark mode</figcaption>
</figure>

## Generating lots of fake data

The first step in this process was generating lots of fake data for testing with. The example schema used in the Piccolo Admin contains two tables - `Movie` and `Director`.

The original dataset was painstakingly collected via Google searches - e.g. finding out what each movie grossed, and if they had Oscar nominations. Clearly this wasn't going to scale if we wanted to test with millions of rows. Plus there are only so many actual movies in existence.

To generate fake data, but keeping it semi-realistic, the [Faker](https://pypi.org/project/Faker/) library was used. The benefit of using semi-realistic data, is it's easier to get a better sense of the user experience, compared to using Lorem ipsum everywhere.

You can see the source code used for this [here](https://github.com/piccolo-orm/piccolo_admin/blob/6cd17f63b1d80c109695dbea3a6ab198be8868df/piccolo_admin/example.py#L91).

## What are the bottlenecks?

After generated lots of fake data, we could identify the main bottlenecks.

### Pagination

Currently the Piccolo Admin uses limit-offset pagination, which isn't efficient when the page number is high. However, even at very high page numbers, it's still usable. It just puts unnecessary load on the database. For a page size of 100, and reading page 1,000, the database will read 100,000 rows, and will throw away the first 99,900. That's just the way offset is implemented in Postgres.

Work has started on more efficient pagination methods, but for now, it's still usable at high row counts.

### Foreign key selectors

For the `Movie` table, each row has a foreign key to a `Director` row. The user needs an efficient way of selecting the director when inserting / editing rows, and also when filtering.

This is the main bottleneck for supporting large database tables. If a simple select element is used, it needs to load all possible options for a director, which means loading the ID and an identifier (e.g. director name) for every row in the `Director` table. Clearly this won't scale well in terms of performance. It also isn't a great UI - as the user needs to scroll through thousands of options in a select element to find the one they're after.

The solution is to use a search input instead. For the filter sidebar, this has now been implemented. But for the edit and add pages, it will be implemented soon.

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/admin-millions-of-rows/search-empty.png" class="medium" alt="Foreign key selector - empty" />
</a>
<figcaption>Empty search field</figcaption>
</figure>

<figure>
<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/admin-millions-of-rows/search-with-content.png" class="medium" alt="Foreign key selector - with content" />
</a>
<figcaption>Search field with content</figcaption>
</figure>

## Conclusions

The recent improvements are a good start in making the Piccolo Admin scalable. We'll continue to make the UI and performance as good as possible with large datasets.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Which is the fastest ASGI server?]]></title>
            <link>https://piccolo-orm.com/blog/which-is-the-fastest-asgi-server/</link>
            <guid>https://piccolo-orm.com/blog/which-is-the-fastest-asgi-server/</guid>
            <pubDate>Sun, 28 Feb 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[

[ASGI](../introduction-to-asgi/) is a specification which allows interoperability of async Python web frameworks and servers. There are several different ASGI servers ([Daphne](https://pypi.org/project/daphne/), [Hypercorn](https://pypi.org/project/Hypercorn/), [Uvicorn](https://pypi.org/project/uvicorn/)).

I was recently doing some load testing for a website, which has to support large numbers of visitors concurrently. I didn't expect there to be any dramatic differences in performance between the different ASGI servers, but even if there's a 20% difference, I'll take it.

It's also a good opportunity to show off [Locust](https://locust.io/), which is a nice load testing tool.

The testing process and results are documented on [GitHub](https://github.com/piccolo-orm/asgi_server_performance).

## Results

Uvicorn:

<a href="#" class="lightbox">
<img src="https://raw.githubusercontent.com/piccolo-orm/asgi_server_performance/master/images/uvicorn.png" title="Uvicorn" />
</a>

Hypercorn:

<a href="#" class="lightbox">
<img src="https://raw.githubusercontent.com/piccolo-orm/asgi_server_performance/master/images/hypercorn.png" title="Hypercorn" />
</a>

Daphne:

<a href="#" class="lightbox">
<img src="https://raw.githubusercontent.com/piccolo-orm/asgi_server_performance/master/images/daphne.png" title="Daphne" />
</a>

## Conclusions

Uvicorn achieved roughly 40% more throughput than the others in this test. However, they all did well, and were stable under high load.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Deprecation warnings in Python code]]></title>
            <link>https://piccolo-orm.com/blog/deprecation-warnings-in-python-code/</link>
            <guid>https://piccolo-orm.com/blog/deprecation-warnings-in-python-code/</guid>
            <pubDate>Wed, 24 Feb 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
When working on Python libraries like Piccolo, it's important to evolve it in a graceful way.

Over time code will get added which is regrettable (for example, function names with typos), or is superceded by more modern approaches, and needs to be retired.

Python has a builtin way of handling this, with the `warnings` module. If a function is going to be retired, we can do this:

```python
# app.py
import warnings


def my_regrettable_function():
    warnings.warn(
        "my_regrettable_function will be retired in version 1.0, please "
        "use my_awesome_function instead.",
        DeprecationWarning,
        stacklevel=2
    )
    return "oops"


if __name__ == "__main__":
    my_regrettable_function()

```

When we execute the script, you can see that the Python outputs the warning to stderr:

```bash
>>> python app.py
app.py:16: DeprecationWarning: my_regrettable_function will be retired in version 1.0, please use my_awesome_function instead.
  my_regrettable_function()
```

If you like, you can redirect stderr into a different file, to capture all of the warnings:

```
>>> python app.py 2> warnings.txt
```

Or just straight up ignore the warnings all together:

```
>>> python -W ignore app.py
```

One other cool thing you can do is to turn the warnings into exceptions, to be sure you're not running any deprecated code:

```
>>> python -W error app.py

```

The issue here, is any deprecated code within Python itself will start raising Exceptions. To just target warnings within a given module:

```
>>> python -W error::DeprecationWarning:__main__ app.py
Traceback (most recent call last):
  File "app.py", line 17, in <module>
    my_regrettable_function()
  File "app.py", line 6, in my_regrettable_function
    warnings.warn(
DeprecationWarning: my_regrettable_function will be retired in version 1.0, please use my_awesome_function instead.
```
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[What is the maximum number of coroutines you should run concurrently?]]></title>
            <link>https://piccolo-orm.com/blog/what-is-the-maximum-number-of-coroutines-you-should-run-concurrently/</link>
            <guid>https://piccolo-orm.com/blog/what-is-the-maximum-number-of-coroutines-you-should-run-concurrently/</guid>
            <pubDate>Tue, 23 Feb 2021 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
When you use threads for concurrency, it's commonly understood that if you use too many threads, performance can actually suffer. This is because the operating system has to spend a lot of time context switching.

One of the advantages of coroutines in asyncio is they are more lightweight than threads, and in theory you can have many more coroutines than threads. But there must be a limit, after which the event loop which has to schedule the coroutines is overwhelmed.

I wrote a simple test to try and work out whether it's more efficient to batch up your coroutines, and to let one batch finish before adding another batch to the event loop.

```python
import asyncio
import time


# We will tweak this number to see when it's more efficient to batch up
# coroutines vs running them all in one go.
COROUTINE_COUNT = 100000


async def test_coroutine():
    """
    A simple coroutine - the sleep statements represent waiting for network
    calls.
    """
    await asyncio.sleep(0.1)
    await asyncio.sleep(0.1)
    await asyncio.sleep(0.1)


def get_coroutines():
    return [test_coroutine() for i in range(COROUTINE_COUNT)]


async def run():
    """
    Run the coroutines without batching.
    """
    coroutines = get_coroutines()
    await asyncio.gather(*coroutines)


async def run_batched():
    """
    Batch up the coroutines, so the event loop isn't overwhelmed.
    """
    coroutines = get_coroutines()
    iterations = 5
    chunk_size = int(len(coroutines) / iterations)

    remainder = len(coroutines) - (chunk_size * iterations)
    if remainder > 0:
        iterations += math.ceil(remainder / chunk_size)

    for i in range(iterations):
        chunk = coroutines[i * chunk_size : (i + 1) * chunk_size]
        await asyncio.gather(*chunk)


if __name__ == "__main__":
    for test in (run, run_batched):
        start = time.time()
        asyncio.run(test())
        end = time.time()
        delta = end - start
        print(delta)
```

With `COROUTINE_COUNT=10000`:

 * Unbatched: 0.51 seconds
 * Batched: 1.69 seconds

With `COROUTINE_COUNT=100000`:

 * Unbatched: 6.24 seconds
 * Batched: 5.86 seconds

This was run on a 2.6 GHz Intel Core i7 (9th Gen) processor, with 16 GB of RAM.

You'll see that batching up coroutines is much slower, unless we get to incredibly high numbers of coroutines (100,000).

I didn't expect this. I thought the event loop would struggle much sooner. Let's try the experiment again, but replacing `asyncio.sleep` with actual network calls.

```python
import httpx

async def test_coroutine():
    """
    Doing actual network calls now.
    """
    async with httpx.AsyncClient() as client:
        response = await client.get("https://www.google.co.uk")
        assert response.status_code == 200
```

With `COROUTINE_COUNT=100`:

 * Unbatched: 2.12 seconds
 * Batched: 2.90 seconds

As you can see, batching is also slower in this case.

When trying with `COROUTINE_COUNT=1000` I started getting network timeouts. In this situation, batching does make sense - if you run all of the coroutines at once you're more likely to encounter network issues.

The same is true when connecting to a database - unless you're using a connection pool, you will start seeing errors if Postgres has more than 100 open connections.

## Conclusions

The asyncio event loop is surprisingly good at handling large numbers of coroutines concurrently. However, be wary of scheduling too many coroutines which require network access, as you'll hit other bottlenecks (rate limiting, network etc).

]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Top level await in Python]]></title>
            <link>https://piccolo-orm.com/blog/top-level-await-in-python/</link>
            <guid>https://piccolo-orm.com/blog/top-level-await-in-python/</guid>
            <pubDate>Wed, 11 Nov 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
One of the core rules of the `await` keyword in Python is it can only be used within a coroutine.

```python
import asyncio

async def my_coroutine():
    print("Starting")
    await asyncio.sleep(1)
    print("Done")
```

However, there are some situations where you can use top level await.

## python -m asyncio

Using this little trick, it launches a Python interpreter, in which you can use top level await.

It also automatically imports asyncio for you - give `await asyncio.sleep(1)` a go.

## iPython

Recent versions of iPython also support [top level await](https://ipython.readthedocs.io/en/stable/interactive/autoawait.html). You can switch this behavior on and off as follows:

```
%autoawait False
```

It is on by default (as tested in v7.19.0).

## Making your code friendly to top level await

Having top level await is neat, but it can actually cause problems for some code. The way that top level await is achieved is by having an event loop running in the background.

There can only be one event loop running in a thread. If any of your code tries to launch an event loop, perhaps by calling `asyncio.run`, you'll get an error, so be careful with that.]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Database column defaults in Piccolo]]></title>
            <link>https://piccolo-orm.com/blog/database-column-defaults-in-piccolo/</link>
            <guid>https://piccolo-orm.com/blog/database-column-defaults-in-piccolo/</guid>
            <pubDate>Fri, 26 Jun 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
When building Piccolo, the issue of database column default values came up,
and it's a surprisingly tricky subject, which required some thought.

Default values serve two very important purposes:

 1. New row defaults - allows certain values to be omitted when inserting new rows.
 1. New column defaults - allows a user to add non-nullable columns to an existing table.

Let's look at these in some more detail.

## 1. New row defaults

When adding a new row to the database we sometimes want to omit certain
values, and let the ORM or database fill it in for us. A common example is
a 'created_on' column, which we want to default to the current date and time.

There are two options for how to handle this. One is for the ORM itself to
provide the defaults, and the other is to let the database provide the
defaults.

The ORM approach sets the defaults before insertion, while the database approach
inserts the defaults during insertion. The benefit of the ORM approach, is let's
say you're building a form, you can pre-populate the form with the default
values, rather than just having them blank. The downside of the ORM approach
is the defaults can become stale. Let's take the 'created_on' example - the
value which is saved isn't when it was inserted into the database, but rather
when the ORM instantiated the default. Most of the time this doesn't matter
much, but I can imagine use cases where storing the precise creation time in
the database would be important.

If we let the database handle the defaults, there are some advantages. The
performance is likely to be marginally better. More of the logic is encoded
in the database, rather than the application layer. This means that if the ORM
is bypassed, there is greater consistency. The downside of letting the database
handle defaults is you don't have as much control, and the previously mentioned
use case of pre-populating defaults in a form.

### Static vs dynamic defaults

Static defaults are very easy for an ORM to handle. Lets say you're building
a game, and need a 'player' table. The 'score' column will have a default of 0,
which will be the same for each player.

Dynamic defaults on the other hand are challenging. In Django, you're able to
provide a function as a default. In some respects this is good, because it
gives the programmer a lot of control. But on the other hand, allowing code
of potentially unlimited complexity as a default can cause issues. For example,
if the default code triggers other database queries. Or if it pulls in a lot of
dependencies.

The last point is most pertinent with migrations. The golden rule of migrations
is we want them to be self contained, and decoupled as much as possible from
the wider application code. This is because we want someone who runs the
migrations in the future to get the exact same results, even if the wider
application code has subsequently changed.

Some dynamic defaults are tricky to avoid - for example timestamps and UUIDs.
But since they're fairly universal, we can handle these use cases without
the user needing to write custom code.

## 2. New column defaults

If you're adding a new column, which is not nullable, you need to set default
values for all existing rows.

The easiest solution is just to the set the ``DEFAULT`` on the ``ADD COLUMN``
statement. For example:

```sql
ALTER TABLE person ADD COLUMN name VARCHAR DEFAULT '';
```

This will then backfill any existing rows with the default value.

If you were only using the ORM to populate defaults, you'd have to add the
column as nullable, use the ORM to populate the new default values, and
then switch it back to being non-nullable.

Alternatively, what Django does is it adds the default value to the ``DEFAULT``
clause on the ``ADD COLUMN`` statement, but then immediately removes it in the
same transaction once the backfill is complete. Like this:

```sql
BEGIN;
ALTER TABLE person ADD COLUMN name VARCHAR DEFAULT '';
ALTER TABLE person ALTER COLUMN name DROP DEFAULT;
COMMIT;
```

## How Piccolo handles defaults

Piccolo takes a hybrid approach.

Piccolo populates the default clause when adding a column to the database, so
if you were to bypass the ORM, the defaults will still be populated.

But when using the ORM to insert new rows, the defaults are populated in Python.
This solves the aforementioned use case of populating forms with default
values. In the future it might be possible to disable this behavior, and to
only ever use the database defaults.

## Piccolo default values

These are the sorts of defaults which Piccolo allows:

 1. Static values - e.g. `1`, `'a'`, `datetime.datetime(year=2020, month=1, day=1)`
 1. Predefined dynamic defaults

### 1. Static values

When defining a Piccolo column, it will check if the default is a static value,
and if it's of the correct type then it's allowed.

### 2. Predefined dynamic defaults

Some column types come with an associated ``Enum``, which covers common dynamic
defaults. For example ``TimestampDefault.now``, which under the hood will call
``datetime.datetime.now``. And ``UUIDDefault.uuid4``, which under the hood
will call ``uuid.uui4``.

By using an ``Enum``, it's easier to serialise the defaults in migrations.

It also means we can translate the default into SQL if necessary. So
``TimestampDefault.now`` maps to ``current_timestamp`` in Postgres.

### What if I have more complex use cases?

If you need very complex default values, these are best handled in the
application code. So for example, an endpoint can detect if a value wasn't
provided, and can add it.

## Conclusions

Hopefully that's been a useful insight into how Piccolo handles defaults.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Python package versioning]]></title>
            <link>https://piccolo-orm.com/blog/python-package-versioning/</link>
            <guid>https://piccolo-orm.com/blog/python-package-versioning/</guid>
            <pubDate>Mon, 18 May 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
When releasing software on the [Python Packaging Index](http://pypi.org/) (PyPI), you typically use [semantic versioning](https://semver.org/) e.g. 1.2.1 (major.minor.patch).

If a user installs your library, they'll usually version pin it in their requirements.txt file, so in this case `some_package==1.2.1`. The advantage of doing this is when a colleague clones your project, or you deploy to production, you know the software dependencies are exactly correct.

With Piccolo, which consists of a set of related, interdependent projects, specifying the exact version poses some challenges.

The main Piccolo packages as of May 2020 are:

 * piccolo (the main ORM)
 * piccolo_admin
 * piccolo_api

Both piccolo_admin and piccolo_api have piccolo as dependencies.

Every time a new piccolo package is released, ideally we will also release new versions of piccolo_api and piccolo_admin. That means that whoever installs piccolo_admin or piccolo_api gets the latest and greatest version of piccolo too. But there are practical limitations to this. Manually releasing packages is time consuming. Even if it's completely automated using something like [Github Actions](https://github.com/features/actions), you will end up with a lot of releases which are just updating dependencies.

Another challenge I've encountered, is all three projects started at different times, and their versions have no relation to each other. For example, the latest versions as of May 2020 are:

 * piccolo - 0.10.7
 * piccolo_api - 0.7.4
 * piccolo_admin - 0.6.4

It's tempting to synchronise their versions, so they all share the same major and minor version.

 * piccolo - 0.10.7
 * piccolo_api - 0.10.4
 * piccolo_admin - 0.10.4

But this is somewhat limiting, as you can only manipulate the bug fix version.

## What about merging it into a single package?

Software developers have been influenced by the Unix philosophy, where simple components are combined to perform complex tasks. With Unix though, the components aren't dependent on each other. In fact, they don't even know that the output is being piped into the input of another command.

The Unix philosophy isn't always the best approach though. By merging all of the packages together, you remove the problem of having to synchronise version numbers.

I think there's a reasonable middle ground, where a package isn't broken down needlessly into small chunks, and on the other hand, isn't too large.

What's too large? In my experience, projects which are too large are the ones you're worried to release. Where one change can potentially have a large number of side effects.

In terms of libraries, I don't think file size is a particularly big concern for most people anymore.

## Loosening dependency versions

Another solution is to loosen dependency versions. So rather than piccolo_api requiring version 0.10.7 of piccolo, instead you can specify version 0.10.*. [Pip](https://pypi.org/project/pip/) handles this fine. You can also do something like piccolo>=10.2,<11.

This is a reasonable solution. It does create some ambiguity though, which could result in bugs. It's unlikely you'll run unit tests for your project with every dependency version in the range. It's important that the developer is disciplined with their package versioning, so the major version is incremented if there are any backwards incompatible changes.

## Sanity check

I think it's really important for a library author to have a sanity check in place, so they know that the latest version of their library can be installed, and works as expected. For the piccolo admin, I deploy a [demo site](http://demo1.piccolo-orm.com/), so I can check it all still works at a high level.

## Conclusions

For Piccolo, I've decided to loosen the dependency requirements. I'd also like to get to version 1.0 very soon.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Build a Python CLI quickly with targ]]></title>
            <link>https://piccolo-orm.com/blog/build-a-python-cli-quickly-with-targ/</link>
            <guid>https://piccolo-orm.com/blog/build-a-python-cli-quickly-with-targ/</guid>
            <pubDate>Wed, 22 Apr 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Building command line tools is a daily occurrence for many developers, and is also an accessible way for junior programmers to build useful tools, without having to create a GUI.

I wanted to make a library which made building a command line tool as painless as possible. There are also some advanced features I needed from a CLI library for Piccolo, and so [targ](https://github.com/piccolo-orm/targ) was born.

Targ creates a CLI just using type annotations and docstrings, so you can turn your existing functions into a CLI with very little effort.

```python
# main.py
from targ import CLI


def add(a: int, b: int):
    """
    Add the two numbers.

    :param a:
        The first number.
    :param b:
        The second number.
    """
    print(a + b)


if __name__ == "__main__":
    cli = CLI()
    cli.register(add)
    cli.run()
```

And from the command line:

```bash
>>> python main.py add 1 1
2
```

To get documentation:

```bash
>>> python main.py add --help

add
===
Add the two numbers.

Usage
-----
add a b

Args
----
a
The first number.

b
The second number.
```

I encourage you to give it a try.

 * [Github](https://github.com/piccolo-orm/targ)
 * [Read the docs](https://targ.readthedocs.io/en/latest/index.html)

More advanced features are coming soon.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Auto migrations]]></title>
            <link>https://piccolo-orm.com/blog/auto-migrations/</link>
            <guid>https://piccolo-orm.com/blog/auto-migrations/</guid>
            <pubDate>Sun, 15 Mar 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Continuing from the previous [post about database migrations](/blog/database-migrations/), this post will look at how auto migrations are implemented in Piccolo.

## Defining an app

Auto migrations exist in the context of an app. An app is simply a Python package, which contains a Python file at it's root containing an `AppConfig` instance. This Python file is called `piccolo_app.py` by convention.

Here's an example of the folder structure, where 'blog' is our app:

```
piccolo_conf.py
/blog
    __init__.py
    piccolo_app.py
    tables.py
    /piccolo_migrations

```

You can create new apps very easily using the following command:

```bash
piccolo app new my_app_name
```

The contents of the `piccolo_app.py` file looks like this:

```python
# piccolo_app.py
"""
Import all of the Tables subclasses in your app here, and register them with
the APP_CONFIG.
"""
import os

from piccolo.conf.apps import AppConfig
from .tables import (
    Author,
    Post,
    Category,
    CategoryToPost
)


CURRENT_DIRECTORY = os.path.dirname(os.path.abspath(__file__))


APP_CONFIG = AppConfig(
    app_name='blog',
    migrations_folder_path=os.path.join(CURRENT_DIRECTORY, 'piccolo_migrations'),
    table_classes=[Author, Post, Category, CategoryToPost],
    migration_dependencies=[],
    commands=[]
)
```

The important thing to realise is we explicitly import and register any Table
classes which belong to this app.

The reason we do this is:

 * To make sure all of the necessary tables have been imported.
 * To reduce the amount of metaclass magic we'd otherwise have to do, which makes the system less flexible.

You'll also notice that `AppConfig` has a `migration_dependencies` argument. This is a list of import paths for other Piccolo apps whose migrations you need to run before the current app.

```python
['my_other_app.piccolo_app']
```

## Registering the app with piccolo_conf

Make sure you register your apps in the piccolo_conf.py `AppRegistry`.

```python
# piccolo_conf.py
from piccolo.engine.postgres import PostgresEngine
from piccolo.conf.apps import AppRegistry


DB = PostgresEngine(config={
    'database': 'headless_blog_demo'
})


APP_REGISTRY = AppRegistry(apps=['blog.piccolo_app'])

```

## Creating our first migration

Now we have the basic machinery in place, we'll ask Piccolo to create a migration for us.

From the root of our project:

```
piccolo migrations new blog --auto
```

Piccolo will get all of the Table classes from the app's `AppConfig`, so it can build a picture of the required schema.

It will then look at the existing migrations for the app so it can build up a snapshot of the existing schema.

Piccolo then compares the current schema as defined in your AppConfig, to the snapshot, and will generate the necessary alter statements.

These alter statements are then written to a new Python file in the `piccolo_migrations` folder, which is given a timestamp as an identifier.

After you have run this migration, the database schema should now match the desired state.

```
piccolo migrations forwards blog
```

## Migration file contents

In theory, you shouldn't have to worry much about the contents of the migration files. Just create and run them, and Piccolo will do the rest.

However, it's important for understanding how the underlying migration machinery works, so let's take a look at a simple migration file:

```python
from piccolo.migrations.auto import MigrationManager


ID = "2020-03-21T15:05:43"


async def forwards():
    manager = MigrationManager()
    manager.add_table("Author", tablename="author")
    manager.add_column(
        table_class_name="Author",
        column_name="name",
        column_class_name="Varchar",
        params={
            "length": 255,
            "default": "",
            "null": False,
            "primary": False,
            "key": False,
            "unique": False,
            "index": False,
        },
    )

    return manager

```

It all hinges on a class called `MigrationManager`. It's how we register any changes we want to make to the schema. Also, note that the `forwards` function needs to return it.

The way that Piccolo applies the required schema changes is to run the returned `MigrationManager`.

But `MigrationManger` serves another important purpose - if we give a sequence of `MigrationManager` instances to a `SchemaSnapshot`, it can add them up to build a complete picture of the schema.

The next time the user creates a new migration, Piccolo uses a `SchemaDiffer` to work out the differences between the snapshot, and the current Table classes, and generates a new `MigrationManager` instance, which it writes to a migration file.

Here's a visualisation of how auto migrations work internally:

<a href="#" class="lightbox">
<img src="https://piccolo-orm.com/images/blog/migration_graphic.png" class="medium" alt="migration graphic" />
</a>
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Database migrations]]></title>
            <link>https://piccolo-orm.com/blog/database-migrations/</link>
            <guid>https://piccolo-orm.com/blog/database-migrations/</guid>
            <pubDate>Mon, 02 Mar 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
In my experience, one of the most important features for a database library is migrations.

In a team environment, migrations allow developers to share schema changes through source control, which means other developers can easily apply the changes locally, without breaking their flow.

When it's time to deploy to production, migrations provide a simple mechanism for bringing the database up to date.

If using Docker, you can even run the migration command automatically when restarting your application's container (by using a custom entrypoint script), which ensures your code and database are in sync.

Different frameworks implement migrations in different ways, which is what we'll look at now.

## Django style

Django has a robust migration framework. You define your schema in a models.py file. By running `manage.py makemigrations`, Django inspects your models.py for any changes, and creates a corresponding migration file if required. You run the migrations using `manage.py migrate`.

What's great is how automated it is - for most common use cases, as long as you know those two management commands, you don't have to think much more about migrations.

I've personally worked on dozens of Django projects, and have rarely encountered any issues.

## Hand written SQL

Some developers use a standalone migration framework like [Flyway](https://flywaydb.org/). You create migration files manually, and they contain pure SQL. This gives the developer the ultimate control, but is more time consuming than something like Django. Also, if you wanted to write pure SQL migrations using Django, then you can do so using [data migrations](https://docs.djangoproject.com/en/3.0/topics/migrations/#data-migrations). The advantage is Flyway isn't tied to any particular framework, and as it uses plain SQL it's portable, even if you were to completely change your web framework and programming language.

Is it worth the productivity trade-off of having to hand code each migration? For some large enterprise teams it might be worth it, but for simple use cases it's probably overkill if your language or framework of choice provides something more automated.

## Idempotent / diffing / compare / state

There are some tools which take a very different approach. There doesn't seem to be a universal name for them yet, but they work on similar principals. They compare a database with a reference schema, and automatically works out the required DML statements to make them match.

There are a few examples, with varying support for different databases:

 * [Skeema](https://github.com/skeema/skeema)
 * [sqldef](https://github.com/k0kubun/sqldef/)
 * [migra](https://github.com/djrobstep/migra)

In theory, your database could be at any starting state, and it can be migrated to the desired state.

## Which is best?

The best is some kind of hybrid.

A pure state based solution will sort out the schema for you, but sometimes you want to change the data too. Migrations have an advantage here.

I have most experience with the Django style migrations, but will explore the state based migrations more in the near future.

## Resources

Here's a good video comparing different migration approaches:

 * http://dlmconsultants.com/model-vs-mig/

And a mega thread about people's favourite migration systems on Hacker News:

 * https://news.ycombinator.com/item?id=19880334
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Piccolo transactions]]></title>
            <link>https://piccolo-orm.com/blog/piccolo-transactions/</link>
            <guid>https://piccolo-orm.com/blog/piccolo-transactions/</guid>
            <pubDate>Wed, 26 Feb 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Transactions are an essential feature of any database library, but in an async world they can be quite tricky.

## Solution 1 - Atomic

This is the original solution offered by Piccolo.

```python

import asyncio

from piccolo.columns import Varchar, ForeignKey
from piccolo.tables import Table


class Employer(Table):
    name = Varchar(length=100)


class Person(Table):
    name = Varchar(length=100)
    employer = ForeignKey(Employer)


async def main():
    # Each table class has a reference to the engine, in this case Person._meta.db
    transaction = Person._meta.db.atomic()
    transaction.add(Employer.create_table())
    transaction.add(Person.create_table())
    await transaction.run()


if __name__ == '__main__':
    asyncio.run(main())

```

It's useful if you just want to fire off a bunch of queries, but not if you want the results of one query to influence a subsequent query (e.g. fetching a value in one query, and inserting it in a subsequent query).

This is still kept in Piccolo though, despite the limitations, as it's useful if you want to dynamically build a transaction - you can pass it around, and can keep on adding queries to it, until you're ready to run it.

## Solution 2 - Pass the transaction into each run method

What if we used a context manager instead for creating / closing the transaction, and we pass it into each query.

```python
async def main():
    async with Person._meta.db.transaction() as transaction:
        Employer.create_table().run(transaction=transaction)
        Person.create_table().run(transaction=transaction)

```

On the surface, this seems like a good solution - it's very explicit, and seems to do everything we want.

The downside is you have to keep on passing around the current transaction. Imagine that we wanted to get the data from some other functions:

```python

async def fetch_employers(transaction):
    await Employer.select().run(transaction=transaction)


async def fetch_people(transaction):
    await Person.select().run(transaction=transaction)


async def main():
    async with Person._meta.db.transaction() as transaction:
        await fetch_employers(transaction)
        await fetch_people(transaction)

```

Soon enough, a lot of your code needs to be aware of transactions.

## Solution 3 - Contextvars

In Python 3.7, the contextvars module was added. This allows variables to be scoped to the current task.

```python

async def main():
    async with Person._meta.db.transaction():
        await fetch_employers()
        await fetch_people()

```

In the context manager we're assigning the connection with the transaction to the current context.

```python
connection = ContextVar(connection, default=None)


# This is similar to what Piccolo does:
class Transaction():

    async def __aenter__(self):
        self.connection = await get_connection()
        self.transaction = await connection.get_transaction()
        self.token = connection.set(self.connection)
        await self.transaction.start()

    async def __aexit__(self, exception_type, exception, traceback):
        if exception:
            await self.transaction.rollback()
        else:
            await self.transaction.commit()

        await self.connection.close()

        # This removes the connection from the current context:
        connection.unset(self.token)

```

This solution saves us from passing the transaction around explicitly.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Exception handling in asyncio]]></title>
            <link>https://piccolo-orm.com/blog/exception-handling-in-asyncio/</link>
            <guid>https://piccolo-orm.com/blog/exception-handling-in-asyncio/</guid>
            <pubDate>Sun, 23 Feb 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
In most situations, exception handling in asyncio is as you'd expect in your typical Python application.

```python
import asyncio


async def bad():
    raise Exception()


def main():
    try:
        asyncio.run(bad())
    except Exception:
        print("Handled exception")


>>> main()
Handled exception
```

However, there are some situations where things get interesting.

## asyncio.gather

You can use [asyncio.gather](http://localhost:8080/blog/asyncio-gather/) to launch several coroutines, which are then executed concurrently.

```python
import asyncio


async def hello():
    # To simulate a network call, or other async work:
    await asyncio.sleep(1)
    print('hello')


async def main():
    await asyncio.gather(
        hello(),
        hello(),
        hello()
    )

>>> asyncio.run(main())
hello
hello
hello
```

What happens if one of the coroutines raises an exception? The default behavior is for the first exception raised by any of the coroutines to be propagated to the call site of asyncio.gather. The other coroutines continue to run.

If more than one of the coroutines raises an exception, you won't be aware of it. If you need to run some clean up code to handle an exception (for example, rolling back a transaction), then you could potentially miss it if a different coroutine raises an exception first.

You might also wonder when the exception is handled - is it as soon as it's raised, or only when all of the coroutines have completed?

Fortunately, asyncio.gather has an option called **return_exceptions**, which returns the exceptions instead of raising them.

```python
import asyncio


async def good():
    return 'OK'


async def bad():
    raise ValueError()


async def main():
    responses = await asyncio.gather(
        bad(),
        good(),
        bad(),
        return_exceptions=True
    )

    print(responses)
    # >>> [ValueError(), 'OK', ValueError()]

```

We are now aware of every exception which happened. But as a programmer, what do we do with a list of values and exceptions? It feels quite alien.

To solve this problem, I created a library called [asyncio_tools](https://github.com/piccolo-orm/asyncio_tools), which wraps `gather` to make it more user friendly.

```python
import asyncio_tools


async def good():
    return 'OK'


async def bad():
    raise ValueError()


async def main():
    response = await asyncio_tools.gather(
        bad(),
        good(),
        bad(),
    )

    # We can easily get just the successful results
    print(response.successes)
    # >>> ['OK']

    # And the exceptions.
    print(response.exceptions)
    # >>> [ValueError(), ValueError()]

    # We can easily check if we got a certain type of exception
    if ValueError in response.exception_types:
        print('Received a ValueError exception')

    # We can combine all of the exceptions into a 'CompoundException':
    exception = response.compound_exception()
    if exception:
        raise exception()

```

If we raise a `CompoundException`, it allows us to return information about several
exceptions. When we catch such an exception, we can do the following:

```python

async def main():
    try:
        await some_coroutine()
    except asyncio_tools.CompoundException as exception:
        print(exception)
        # >>> 'CompoundException, 2 errors [ValueError, ValueError]'

        if ValueError in exception.exception_types:
            print('Caught a ValueError')

```

This makes handling exceptions in concurrent code easier - I encourage you to check it out.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Python contextvars]]></title>
            <link>https://piccolo-orm.com/blog/python-contextvars/</link>
            <guid>https://piccolo-orm.com/blog/python-contextvars/</guid>
            <pubDate>Sat, 22 Feb 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
The [contextvars module](https://docs.python.org/3/library/contextvars.html) was [added in Python 3.7](https://www.python.org/dev/peps/pep-0567/) to solve issues with [thread local data](https://docs.python.org/3/library/threading.html#thread-local-data) in [asyncio](https://docs.python.org/3/library/asyncio.html) programs.

An example of thread local data is when a web server handles each request in a separate thread. You can store information about the current request in a way which doesn't bleed out to other threads, and it saves you from passing the request information to every function or method which requires it. If you've used [Flask](https://palletsprojects.com/p/flask/) before, this is [basically what it does](https://stackoverflow.com/questions/25887910/what-does-thread-local-objects-mean-in-flask).

With asyncio programs, thread local data is no longer enough. Each thread can be executing several tasks concurrently, and it would be good to scope variables to tasks and not just the thread as a whole. An example is something like a database transaction, which you don't necessarily want to pass it around to every function which needs it.

Firstly, it's important to understand what a task is. When you write an asyncio program, you write a bunch of coroutines using `async def`.

```python
import asyncio


async def get_name():
    # To simulate a network call
    await asyncio.sleep(1)
    return 'Bob'


if __name__ == '__main__':
    asyncio.run(get_name())

```

When you ask asyncio to run a coroutine, with any of the following:

 * asyncio.run
 * asyncio.gather
 * asyncio.create_task

It wraps the the coroutine in a task. A coroutine is basically a function which can be suspended - a task adds some useful machinery around it, like being able to cancel it, and add callbacks.

The important thing to understand is your coroutine will always be running inside some task.

<figure>
    <a href="#" class="lightbox">
        <img src="https://piccolo-orm.com/images/blog/asyncio_contextvars.png" class="medium" alt="asyncio contextvars" />
    </a>
    <figcaption>An example asyncio program</figcaption>
</figure>

In the above diagram, you can see an example asyncio program.

 * The entry point is a coroutine which is run using `asyncio.run`, which wraps it in a task.
 * Whenever a new task is created, a snapshot of the parent context is taken, and this applies to the new task. Any subsequent changes to the parent context don't apply to the child task.

Even though it might seem like lots of things are going on at once in an asyncio program, in reality it's just hopping between different tasks, which have their own context.

We can use context managers to manipulate the context in the task - it won't bleed out to the other existing tasks, as they took a snapshot of the context when they were created.

Here's an example:

 ```python

from contextvars import ContextVar

from my_library import get_connection


# If we don't give it a default, then it raises a LookupError if we try and
# access the value using connection.get(), without having first set a value
# using connection.set(some_value).
connection = ContextVar(connection, default=None)


# This is similar to what Piccolo does:
class Transaction():

    async def __aenter__(self):
        self.connection = await get_connection()
        self.transaction = await connection.get_transaction()
        self.token = connection.set(self.connection)
        await self.transaction.start()

    async def __aexit__(self, exception_type, exception, traceback):
        if exception:
            await self.transaction.rollback()
        else:
            await self.transaction.commit()

        await self.connection.close()

        # This removes the connection from the current context:
        connection.unset(self.token)


async def run_in_transaction(sql):
    # We don't have to pass the connection explicitly - we can get it from
    # the context.
    _connection = connection.get()
    if _connection:
        return await _connection.run(sql)


async def main():
    async with Transaction():
        await run_in_transaction('select * from foo')


if __name__ == '__main__':
    asyncio.run(main())

 ```

## Resources

 * [A good article on contextvars](https://www.pythoninsight.com/2019/03/context-variables/)
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Postgres concurrency]]></title>
            <link>https://piccolo-orm.com/blog/postgres-concurrency/</link>
            <guid>https://piccolo-orm.com/blog/postgres-concurrency/</guid>
            <pubDate>Sun, 16 Feb 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
When using an async ORM, it's good to understand how the underlying database handles concurrency.

When Piccolo makes a query, it gets a connection from the database adapter. The Postgres server spawns a new process to handle each connection it receives. If you open up a bunch of connections to a database server, and enter `ps aux | grep postgres` on the command line, then you'll see the connection processes.

A database server is configured to only support a certain number of connections at a time. If this limit is exceeded then you'll get an error. An async web app will typically open far more database connections than a synchronous one, so it's important to use a connection pool.

In the case of Piccolo, if you configure a connection pool, then it will wait for a connection to become available in the pool, rather than making more and more connections, which will eventually cause an error.

What's really interesting is how Postgres is able to process all of those connections in a performant way. Postgres uses a concurrency model called MVCC (Multiversion Concurrency Control). It's a deep subject, but in a nutshell Postgres maintains several versions of a row internally.

Each row version has an xmin and an xmax value. Each connection is assigned an incrementing transaction ID (txid), and can see rows where xmin <= txid <= xmax. This means that other connections can insert, modify, and delete rows without affecting the other connections, as the new row versions are assigned higher xmin values.

This reduces the amount of locking required during concurrent access, which improves throughput. The downside is the database needs to be vacuumed periodically to remove old versions of rows, which have xmin and xmax which are lower than the current txid values.

In terms of tuning your Postgres server for maximum performance:

 * CPU core count - will increase the number of connection processes which can execute in parallel.
 * RAM - increasing the Postgres server's shared_buffers setting will help reduce disk reads (around 25% of total system memory is recommended).
 * Fast disk - will help prevent IO bottlenecks.

Avoid ramping up the max connection limit too high. Above a certain number it becomes counter productive, as the connections themselves consume RAM, which could otherwise be used by Postgres itself for making queries. The default is 100, which should be sufficient for most needs.

## Resources

 * [Good overview of Postgres connections](https://brandur.org/postgres-connections)
 * [Official Postgres docs](https://www.postgresql.org/docs/current/tutorial-arch.html)
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[asyncio.gather]]></title>
            <link>https://piccolo-orm.com/blog/asyncio-gather/</link>
            <guid>https://piccolo-orm.com/blog/asyncio-gather/</guid>
            <pubDate>Sat, 15 Feb 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
When it comes to learning the [asyncio](https://docs.python.org/3/library/asyncio.html) library in Python, there are two important functions to be aware of. The first is `run`, which is a simple way to run a coroutine, and the second is `gather`.

`gather` lets you fire off a bunch of coroutines simultaneously, and the current context will resume once all of the coroutines have completed. The return value is a list of responses from each coroutine.

Here's a real life example, where a bunch of API endpoints are accessed concurrently, as a means of load testing a server:

```python
import asyncio

import httpx


ids = [1, 10, 12, 15, 20, 100]


async def main():
    async with httpx.AsyncClient(timeout=30.0) as client:
        response = await asyncio.gather(
            *[
                client.get(
                    f"https://foo.com/api/{_id}/",
                ) for _id in ids
            ]
        )

    # Check all of the requests were successful:
    assert {i.status_code for i in response} == {200}


if __name__ == "__main__":
    asyncio.run(main())
```

We are using the [httpx](https://github.com/encode/httpx) library to make network requests.

One thing to be aware of is you can potentially open up a lot of connections using `gather`. If you open too many, this can result in errors as operating systems will only let you open a certain number of sockets at a time, so don't go too crazy!

Likewise, when connecting to a database, only a certain number of connections can be open at a time. It's important to use a connection pool to avoid errors.

Here's an example using [Piccolo](http://piccolo-orm.com/):

```python
import asyncio

from piccolo.engine.postgres import PostgresEngine
from piccolo.columns import Varchar
from piccolo.tables import Table


DB = PostgresEngine({
    'host': 'localhost',
    'database': 'my_app',
    'user': 'postgres',
    'password': ''
})


class Person(Table, db=DB):
    name = Varchar()


async def main():
    await DB.start_connnection_pool()

    # This is a contrived example - imagine each of these are different
    # queries:
    await asyncio.gather(*[Person.select().run() for _ in range(500)])

    await DB.close_connnection_pool()


if __name__ == "__main__":
    asyncio.run(main())
```

With Piccolo, if we make sure a connection pool is open then we're fine - if all connections are being used, the coroutine will wait until one becomes available.

As you can see, `gather` is super powerful. It lets us concisely request several resources concurrently, which is a common occurence in web apps.

## asyncio_tools

If you want to take your use of asyncio.gather to the next level, check out [asyncio_tools](https://github.com/piccolo-orm/asyncio_tools).

## Resources

-   [Official docs](https://docs.python.org/3/library/asyncio-task.html#asyncio.gather)
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Netlify vs Self Hosting]]></title>
            <link>https://piccolo-orm.com/blog/netlify-vs-self-hosting/</link>
            <guid>https://piccolo-orm.com/blog/netlify-vs-self-hosting/</guid>
            <pubDate>Sat, 15 Feb 2020 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
This is a little off topic from the usual [Python articles](..), but I recently migrated this website to [Netlify](https://www.netlify.com/) for hosting.

Before that, it was hosted on my own server inside a [Docker](https://www.docker.com/) container. The deploy process was:

 * Build the Docker image
 * Push the image to a registry
 * SSH onto the server, pull the new image, and recreate the container

That workflow isn't too bad if you're deploying once a week or so. But it's pretty tedious if you do a deployment, realise there's a typo, and then have to go through it all over again.

You might wonder why it needed to be in a Docker container. It's because the rest of my infrastructure requires it. However, the same downsides apply if you're just manually building and deploying static files to your own webserver.

There are many services out there for hosting static websites. Netlify does have a very slick solution though.

When you push a build to Github, Netlify is notified via web hooks, which then triggers a build and automatic deployment. In most situations no manual work is required by the developer once this workflow is setup. The website is deployed to a [CDN](https://en.wikipedia.org/wiki/Content_delivery_network), which results in fast load times throughout the world. The final cherry on top is that SSL is automatically configured too.

So with all these advantages, would you ever host a static website yourself?

In a post-[GDPR](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation) world, tools like Google Analytics aren't an effective measure for website engagement, as users should be allowed to opt out of tracking. The only true measure of engagement is web server logs. Whilst this isn't a perfect solution, because some traffic isn't recorded due to caching, and bots add a lot of noise, it's still valuable to have. By using a static host, you are losing this granularity. Netlify offers it's own analytics solution as a paid add on.

Talking about paid add-ons, if you really went to town with Netlify, then the costs do start to add up. For a more ambitious site with a large team, weigh up the costs carefully vs just hosting it via Nginx on your own server. This isn't to suggest that Netlify is overpriced, it's just the nature of services like this. The same is true with AWS.

Also, be wary of putting a square peg in a round hole. For example, if your app has lots of dynamic content, authentication, and backend services, you're probably best off hosting that yourself, or at least looking for some other alternative.

Where services like Netlify really shine are with open source. Being able to accept a pull request into master, and then those changes instantly go live, is such a nice workflow and time saver.

These Git driven workflows are super powerful. Sure, you can configure your own Continuous Deployment (CD) pipeline using something like [Circle CI](https://circleci.com/) or [Travis](https://travis-ci.com/), but services like Netlify, which effectively offer CD as a service, are sure to grow more popular.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Namespacing Python attributes]]></title>
            <link>https://piccolo-orm.com/blog/namespacing-python-attributes/</link>
            <guid>https://piccolo-orm.com/blog/namespacing-python-attributes/</guid>
            <pubDate>Tue, 10 Sep 2019 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
When you have large classes, it can be useful to namespace some of the attributes.

For Piccolo, there are a few large classes, and the one which triggered this thought process was Column.

In Column, there's a bunch of different attributes which we'd like to differentiate:

 * Private vs public
 * Library vs user created

By namespacing our attributes, it makes the intention of the software clearer, and also helps prevent name collisions, which could result in unexpected behaviour.

## _ prefix (private)

The convention in Python is to prefix private attributes with an underscore.

It doesn't actually stop a user from accessing or overriding the attribute, like in some other languages, but it's still helpful.

## __ prefix (mangled)

You might consider prefixing an attribute with two underscores to designate it as ultra private, but this does something special in Python - name mangling. It allows you to prevent attributes being overridden by subclasses. Here's an example:

```python
class Person():
    __name = 'Bob'


class Employee(Person):
    __name = 'Security Guard'


employee = Employee()
>>> employee._Person_name
'Bob'
>>> employee._Employee_name
>>> 'Security Guard'
```

Python automatically modifies the attribute name to contain the name of the class. This prevents collisions.

It's useful for libraries which provide base classes which are meant to be subclassed by users.

It's very cool, but you don't see it used often, most likely because it's not widely known about.

Note, it only works if the attribute name has less than one trailing underscore, to avoid confusion with magic methods (see below).

## Magic methods

Magic methods (also called dunder methods) are attributes which have double underscore prefix and postfix. The one most people know is `__init__` in classes.

It allows Python to implement functionality transparently, without adding additional syntax.

For example, calling an object just calls it's `__call__` method. Instantiating an object just calls its `__init__` method.

It's tempting for library authors to use this dunder syntax for their own variables but it's not recommended. The magic methods are how Python implements some important functionality, and what if they add a new magic method in the future, which clashes with your own variable? It's unlikely, but best to be safe. Consider the dunder namespace to be just for the Python runtime.

## Nested classes

In some libraries you'll see this:

```python
# Representing a database table.
class Table():

    class Meta():
        tablename = 'awesome_table'
```

If we'd done this instead:

```python
class Table():
    tablename = 'awesome_table'
```

It means the user can't subclass `Table`, and define their own tablename variable without breaking the library in some way.

```python
# We can now do this, if we wanted a tablename column on our table:
class MyTable(Table):
    tablename = CharField()

    class Meta():
        tablename = 'awesome_table'
```

But what's actually going on when we define classes inside classes? There's actually nothing weird about this in Python - it's just like declaring any other attribute.

We access them as you'd expect - `MyTable.Meta.tablename` or `MyTable().Meta.tablename`.

One advantage of this approach, is as it's generally just used in libraries, it's unlikely a user would want to define a Meta attribute of their own, which avoids a naming collision.

A disadvantage is the inner class can't access attributes in the outer class (for example, inside a method). In theory you can do some metaclass magic to bind a reference to the outer class in the inner class, but this is seriously advanced / dubious jank. Also, nested classes can seem a bit strange to users at first, who might be confused by it.

### Inheritance

We can use inheritance on the nested class, which is quite interesting:

```python
class Table():
    class Meta():
        foo = 1


class MyTable(Table):

    class Meta(Table.Meta):
        bar = 2

MyTable().Meta.foo
>>> 1
MyTable().Meta.bar
>>> 2
```

## Dictionaries

Alternatively, we can just use a dictionary:

```python
class MyTable(Table):
    tablename = CharField()

    meta = {
        tablename = 'awesome_table'
    }
```

Which works fine if you just want some simple values. Classes allow you to namespace methods too, so are generally preferable.

## Naming conventions

Finally, we can just prefix our attributes with an identifier, in this case `piccolo_`:

```python
class MyTable(Table):
    piccolo_tablename = 'awesome_table'

```

## Conclusions

We've looked at a few different solutions for namespacing attributes. In an ideal world, our classes are simple enough to not need any of these techniques, but in larger libraries like Piccolo it can lead to more understandable and robust code.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Understanding JWT and Sessions]]></title>
            <link>https://piccolo-orm.com/blog/understanding-jwt-and-sessions/</link>
            <guid>https://piccolo-orm.com/blog/understanding-jwt-and-sessions/</guid>
            <pubDate>Sat, 10 Aug 2019 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
When it comes to authenticating your web app, there are two main choices - session auth and token auth.

## JWT (JSON Web Token)

There are many different kinds of token auth, but JWT is a popular format.

When a user logs in, we provide them with a JSON payload, which can contain any information we want, such as a user_id, permissions, favourite colour of car etc. This payload is signed using a secret key, so when the user presents this token to us (for example, in an API request), we can verify it hasn't been tampered with.

## JWT Pros

### Stateless

This is often given as an advantage of JWT. In theory, since all the information is embedded within the token itself, there's no need to do database calls, like with session auth, to work out who the user is.

In reality though, most robust JWT systems still require some database calls. For example, we might implement a blacklist for tokens, and often that blacklist will be stored in a database. And even though we get the user ID from the token, we might still want to get extra information about that user, such as whether their account has been deactivated.

### Cross domain

Once you have a token, you can access an API from anywhere. This is great for mobile apps, and single page applications.

### Mobile and Machine-to-Machine friendly

Tokens are slightly easier to use for mobile phone apps, embedded systems, and micro services.

Most networking libraries do support cookies though, even those on native apps. At the end of the day, cookies are still just HTTP headers.

## JWT Cons

### Size

Tokens are larger than session IDs, so they consume more network bandwidth.

### Javascript tampering

We can't use local storage for storing our token, in case malicious Javascript gets on our webpage.

There are many ways this can happen:

 * Cross Site Scripting
 * Javascript from a CDN has been compromised
 * Malicious browser extensions

It's trivial for any malicious Javascript to read a token from local storage, and send it to third party server, where a hacker now has complete control of your account until the token expires.

Malicious Javascript is never good. The attacker can bind listeners to password fields, login forms, payment forms etc. However, to compromise a login form, the malicious Javascript has to be injected on the login page. With tokens, a hacker can compromise any part of your site (e.g. a blog comments section), and they can get your token from local storage.

So should we use token auth in web apps? We can still store it in a cookie, but that means a web app running on `app.foo.com` can't access `api.foo.com`. To get around this limitation, you can setup a proxy api running on `app.foo.com`, which forwards requests to `api.foo.com`.

### Stale data

Even though you can store lots of data in a JWT, it doesn't mean you should. If you include 'permissions' in your token, those permissions won't expire until the token does. Also, without some kind of blacklist, you can't revoke someone's token until it expires.

## Sessions

Session auth is the stalwart of web authentication. When a user logs in, the web server sets a HTTP-Only cookie containing a session ID. Whenever a request is made to the domain which set the cookie, the cookie is sent in the header.

The web server stores a reference (usually in a database) for the session id, mapping it to the user it belongs to, an expiry date, and any other arbitrary information you want to store for the duration of the session. Each time a request is made, the session expiry is usually updated in the database, so the user only has to login again after say 20 minutes of inactivity, rather than just 20 minutes from when they last logged on.

## Session Pros

### Well understood

Sessions are well understood. Most of the large web frameworks use them (e.g. Django).

### Immune to Javascript tampering

Web browers have built in security around HTTP-Only cookies, which makes them secure. If some malicious Javascript was present on your web page it could still cause some serious damage (e.g. making AJAX calls on your behalf), but they can't extract your session token. As soon as the user closes the webpage, the attack will stop.

## Session Cons

### CSRF

This is the big problem with session auth. Since the cookies are sent with every web request to the matching domain, a malicious third party website can make requests on a user's behalf. However, it's a well understood problem and all major web frameworks have some kind of CSRF mitigation in place.

### Same domain only

The same domain constraint can be challenging. Can proxy via the backend server though.

## Conclusions

So which should you use? Tokens are generally preferable on mobile and embedded systems, because they're so easy to work with - call a JSON endpoint, and get a token, then just add that to the header of all future API calls.

With cookies, you have to login, extract the session cookie from the header, configure a cookie jar, and add the cookie to it. It's still not too bad - but some libraries make this harder than others, whilst adding tokens to headers is universally pretty simple. Most endpoints which are protected by session cookies are also protected by CSRF tokens, which also need to be taken into account.

For web applications, session auth makes a lot more sense, mostly for security reasons. Storing tokens in local storage is risky, and if you put the token inside a cookie, you may as well just use session auth. Browser vendors might work out some way of securely storing tokens in the future, but I don't know what this would look like - especially as cookies effectively solve this problem already.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Cross Site Request Forgery]]></title>
            <link>https://piccolo-orm.com/blog/cross-site-request-forgery/</link>
            <guid>https://piccolo-orm.com/blog/cross-site-request-forgery/</guid>
            <pubDate>Sat, 10 Aug 2019 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Cross Site Request Forgery (CSRF) is a well known security vulnerability which all developers must be aware of.

It's when a website uses cookies for authentication, and a logged in user clicks a malicious link on a third party website which makes a request which modifies the user's data in some way.

As an example, someone is logged into a banking website, `banking.com`. On a third party website `evil.com`, there's a link to `banking.com/account/transfer/`. When clicked, the browser will send along the user's session cookie for `banking.com`, and without any additional security measures the request would succeed.

## How to implement CSRF protection

There's a few simple simple ways to protect against CSRF. Here are two examples.

### How Django does it

Django set a CSRF cookie, which contains a random token. Within your HTML template, any forms which change state (i.e. POST/PUT/DELETE) need to add this token as a hidden field.

```html
<form action="/articles/" method="post">
    <!-- Adds a hidden field containing the CSRF token: -->
    {% csrf_token %}

    <!-- Adds the rest of the form fields: -->
    {{ form.as_p }}

    <input type="submit" value="Submit" />
</form>
```

When the form is submitted, the website also sends back the CSRF token contained in the cookie. Django checks that the token in the cookie matches the hidden field value in the form.

This protects against CSRF because a malicious website is unable to read/set cookies on another domain. So a malicious link on `evil.com` wouldn't know what value to set the hidden form field to.

If you have a Single Page Application, then Javascript is used to add the CSRF token as a header to all AJAX calls instead.

## CSRF protection and mobile apps

CSRF is only a problem with browsers. However, if you have an API which is used by mobile apps as well, you need to work around it.

CSRF protection usually involves more than just checking for the presence of a token - it also looks at the referer header. You could try spoofing all of this in the mobile app, so it sets all the appropriate HTTP headers manually.

It's generally just best to bypass CSRF checks for non-web apps, as it doesn't really serve a purpose. In Django's case, this is a by adding a `csrf_exempt` decorator to your view. This becomes a bit messy though if we need to maintain two separate views - one which enforces CSRF validation, and one that doesn't.

One limitation of Django is we can only apply middleware globally to all views. With ASGI, and frameworks like Starlette, we can apply middleware to a subsets of views.

With JWT based authentication, we can add a claim such as ``mobile``, which allows an app to access views without a CSRF token.

Conside this pseudo code:

```python
views = [view1, view2, view3]

app = Router({
    '/mobile': CheckJWTClaimMiddleware(views)
    '/web': CSRFMiddleware(views)
})

```

This allows us to support mobile and web apps.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Should I use Python properties?]]></title>
            <link>https://piccolo-orm.com/blog/should-i-use-python-properties/</link>
            <guid>https://piccolo-orm.com/blog/should-i-use-python-properties/</guid>
            <pubDate>Thu, 08 Aug 2019 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Python properties have been surprisingly divisive amongst developers I've worked with.

In simple use cases, they're great.

```python
class User():
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    @property
    def full_name(self):
        return f'{self.first_name} {self.last_name}'


>>> User('Shirley', 'Jones').full_name
Shirley Jones
```

The problem is they can cause confusion.

```python
class User():
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    @property
    def get_full_name(self):
        return f'{self.first_name} {self.last_name}'


# Feels weird:
>>> User('Shirley', 'Jones').get_full_name
Shirley Jones
```

Just by changing the method name, it feels unnatural for this to be a property. Calling `my_user.get_full_name` feels like it should have brackets after it, because it sounds like a function. So naming is definitely important when using properties.

Also, properties work great if you're confident you won't need to add any arguments in the future.

Imagine we wanted to modify `get_full_name` so it had an `include_title` argument.

If we implemented it as a property, we'll break everyone's code, because now it'll have to be called as a function to work properly:

```python
class User():
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    def get_full_name(self, include_title=False):
        fullname = f'{self.first_name} {self.last_name}'
        if include_title:
            fullname = f'Madam {fullname}'

        return fullname


# We broke our existing code:
>>> User('Shirley', 'Jones').get_full_name
Error!
```

This might not matter much in small projects, but if you're a library author you don't want to introduce a breaking change just by adding an argument to a property.

In API design, properties can be overused too. If you're designing a fluent interface, you don't want to add a cognitive load to a programmer by making them consider 'is this a property or a method?'.

Take this example:

```python
class Select(Query):

    def where(self, query) -> Select:
        # do stuff
        return self

    @property
    def first(self) -> Select:
        # do stuff
        return self

    def run(self):
        return 'some data'

```

To use this API:

```python
select = Select().where(some_query).first.run()

```

Rather than having to remember that `first` is a property, it's cleaner to have them all as plain methods.

```python
select = Select().where(some_query).first().run()

```

Sure, it takes a couple more key strokes, but sometimes consistency is king.

And lastly, perhaps the main way properties can be abused is if a really heavy piece of computation, or a long network request, is done to generate the response. A developer could unexpectedly cripple their app's performance by calling an innocent looking property too many times.

So in conclusion, properties can be great - but consider if you really need them, and if so keep them simple.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Improving tab completion in Python libraries]]></title>
            <link>https://piccolo-orm.com/blog/improving-tab-completion-in-python-libraries/</link>
            <guid>https://piccolo-orm.com/blog/improving-tab-completion-in-python-libraries/</guid>
            <pubDate>Mon, 22 Jul 2019 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
One of the main design goals for [Piccolo](https://github.com/piccolo-orm/piccolo) is to support tab completion as fully as possible.

Tab completion helps developers write code faster, with fewer errors. This is particularly useful with ORMs, where typos could create some unexpected SQL queries.

There are two tools which I rely on heavily each day, and they are [iPython](https://github.com/ipython/ipython) and [VSCode](https://code.visualstudio.com/). Both of them support tab completion, and use the [Jedi](https://github.com/davidhalter/jedi) library under the hood.

The main use cases I want to support with tab completion are:

 1. Being able to see all available methods on a table, for example: `MyTable.select()`, `MyTable.select().first()`, `MyTable.delete()`, and many more.
 1. Being able to navigate through foreign key relationships, for example: `Band.manager.name`, where `manager` is a foreign key to a Manager table, and `name` is a column.

Jedi is very powerful, but it can't perform miracles. If we write our code intelligently, we can get the best possible tab completion experience for the user.

## Add type hints

Jedi understands all sorts of type hints. As well as the native type hints introduced in [PEP 484](https://www.python.org/dev/peps/pep-0484/), Jedi can also understand type hints within docstrings.

I use native type hints throughout Piccolo. The most important type hint, for the purposes of tab completion, is the return type of functions and methods.

```python
def my_function() -> Select:
    # lots of code
    return Select()
```

The reason this so useful, is a tool like Jedi can easily infer the return type, without having to work it out from the actual function body. It's really important for methods which are part of a fluent API. It allows tab completion in situations like this:

```python
# We can continue using tab completion even after a method call:
Band.select().where(Band.name == 'Radiohead').first().run_sync()
```

## Mixins can be problematic

Piccolo originally consisted of a bunch of Query subclasses like `Select`, `Insert`, `Delete` etc. Shared functionality like 'where' clauses were implemented via mixins.

```python
# Some early Piccolo pseudo-code
class WhereMixin():

    def where(self, values):
        # do some stuff
        return self


class Select(Query, WhereMixin):
    pass
```

You'll see here that mixins are problematic - since the `WhereMixin` can be used anywhere, the return type of the `where` method could be anything. This is clearly a big problem for tab completion.

The way around this is to not use Mixins, and use composition instead.

```python
# Some early Piccolo pseudo-code
class WhereDelegate():

    def where(self, values):
        # do some stuff
        return


class Select(Query):

    def __init__(self):
         self.where_delegate = WhereDelegate()

    def where(self, values) -> Select:
        self.where_delegate.where(values)
        return self
```

Now we're able to specify a concrete return type.

## Decorators can be deceiving

If decorators aren't implemented correctly, they can mask the signature of the function being decorated.

Take this example:

```python
def my_decorator(func):
    def wrapper():
        print('I am wrapped')
        func()
    return wrapper


@my_decorator
def hello_world() -> str:
    return 'hello world'


hello_world()
>>> I am wrapped

hello_world.__name__
>>> 'wrapper'
hello_world.__annotations__
>>> {}
```

In the example above, the annotations and original function name have been lost. It's effectively giving false information to any introspection tools, like Jedi. You can fix this though:

```python
from functools import wraps


def my_decorator(func):
    @wraps(func)
    def wrapper():
        print('I am wrapped')
        func()
    return wrapper


@my_decorator
def hello_world() -> str:
    return 'hello world'


hello_world.__annotations__
>>> {'return': str}
hello_world.__name__
>>> 'hello_world'
```

If tab completion is a high priority, keep decorators simple and make sure you use `wraps`. The `wraps` function copies some important attributes from the wrapped function to the decorator (including `__name__`, `__annotations__`, and `__doc__`).

Making decorators accurately reflect the wrapped function is a surprisingly deep subject. This is a great [article](http://blog.dscpl.com.au/2014/01/how-you-implemented-your-python.html) on the subject, which is part of an entire series of [articles](https://github.com/GrahamDumpleton/wrapt/tree/develop/blog).

## Some setattr magic

This is something particular to Piccolo, and not every project will require it.

When you enter say `Band.manager` (where `manager` is a foreign key), it would be nice to be able to keep on using tab completion to see the columns on the `Manager` table. And likewise, if the `Manager` table contains any foreign keys, to be able to follow them using tab completion as well. With other ORMs, you would express this using a string. For example, in Django it would be a string like `'manager__name'`. This is fine, but when you have large, complex models, it's nice to have tab completion.

The way Piccolo achieves this is when you call `Band.manager`, the constructor creates an attribute on the object for each column in the table the foreign key points to. So for the `name` column, a `name` attribute is created on on the object - allowing you to do `Band.manager.name`.

## Conclusions

Tab completion is a powerful tool for developers, and with a bit of thought we can create libraries which leverage it to its fullest.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[ORM design challenges]]></title>
            <link>https://piccolo-orm.com/blog/orm-design-challenges/</link>
            <guid>https://piccolo-orm.com/blog/orm-design-challenges/</guid>
            <pubDate>Wed, 23 Jan 2019 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Building ORMs isn't the easiest thing in the world. Based on past experience working on similar projects, I knew it wasn't impossible though.

Fundamentally an ORM is just a mechanism for converting Python objects into SQL strings, sending them to a database adapter, and converting the response back into Python objects. However, there are some subtleties which are challenging.

## Designing a nice API

This is perhaps the most important consideration when designing an ORM. Making something as user friendly and powerful as possible.

## SQL injection prevention

When generating SQL strings, the ORM needs to be careful not to include raw user input within the string - instead it should be parameterised.

```sql
-- This is OK:
SELECT * from user WHERE username = $1

-- If username = "1; DROP TABLE users", and the query wasn't parameterised:
SELECT * from user WHERE username = 1; DROP TABLE users
```

This sounds simple enough, but is quite challenging.

## Joins

There's two options for joins - either let the user specify joins explicitly, or do it for them automatically.

In Piccolo, joins are done automatically.

```python
Band.select(Band.manager_1.name).run_sync()
```

In order to get the name of `manager_1`, a join is required. There are other situations which require joins. For example:

```python
Band.select().where(Band.manager_1.name == 'Guido').run_sync()
```

Piccolo has to manage the joins under the hood to make this happen.

## Large selects

Queries such as this:

```python
Band.select().run_sync()
```

Which fetch all rows from a table, could return thousands or millions of rows. The ORM needs to handle this under the hood using cursors - fetching data in chunks.

## Documentation and testing

Believe it or not, documentation is also a big challenge. ORMs are fairly large projects, with a broad API. Documenting all of the features and subtleties in an easy to understand way is time consuming. The same is true for tests - which need to be extensive.

## Avoiding complexity explosion

Keeping the codebase maintainable is a challenge. Many existing ORMs are almost completely impenetrable for newcomers who want to deep dive into the code base.

## Conclusions

None of these challenges are insumountable, but I thought it would be an interesting read for others, to help explain some of the challenges of ORM design.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Plugins for Python Projects]]></title>
            <link>https://piccolo-orm.com/blog/plugins-for-python-projects/</link>
            <guid>https://piccolo-orm.com/blog/plugins-for-python-projects/</guid>
            <pubDate>Tue, 22 Jan 2019 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
While writing Piccolo I started thinking about how to make it extensible.

There are various approaches to the problem. I'll outline the pros and cons below, using existing projects as examples.

## Django

[Django](https://www.djangoproject.com/) is batteries included. The core Django package contains a bunch of extensions already - just take a look in the [contrib folder](https://github.com/django/django/tree/066f26fe8b98609726f7962c21de7233afb4ff7e/django/contrib). The advantage of this is you know that all of the extensions bundled with the app will work correctly (when extensions are distributed separately you can potentially get incompatible versions).

Django also allows you to extend it in various ways by installing third party extensions. Enabling these extensions happens in a central settings.py file, which configures everything about your Django project.

## Flask

[Flask](http://flask.pocoo.org/) just provides the core scaffolding for a web app - the routing and views layer.

A Flask project consists of a main app object. You can add register extensions with this app object.

## Pytest

[Pytest](https://docs.pytest.org/en/latest/) is a testing framework, and is really interesting from an extensions perspective. It leverages a feature of [setuptools](https://setuptools.readthedocs.io/en/latest/) which I wasn't familiar with until really digging into the subject.

Setuptools is a library used for creating Python packages. The meta data for a Python package is contained in a setup.py file at the root of the package. In the setup.py file you can specify entrypoints.

Here's a [good example from the Pytest documentation](https://docs.pytest.org/en/latest/writing_plugins.html#making-your-plugin-installable-by-others).

```python
# sample ./setup.py file
from setuptools import setup

setup(
    name="myproject",
    packages=["myproject"],
    # the following makes a plugin available to pytest
    entry_points={"pytest11": ["name_of_plugin = myproject.pluginmodule"]},
    # custom PyPI classifier for pytest plugins
    classifiers=["Framework :: Pytest"],
)
```

Pytest then uses a little known module bundled within setuptools called pkg_resources. It allows a Python project to discover other packages which were installed in the same environment, and which use a certain entrypoint identifier.

The advantage of this is you no longer need a configuration file (i.e. settings.py in Django), or have to manually register extensions with a central app object (like in Flask) - the package discovers plugins automatically.

The disadvantage is all extensions within the environment will automatically be used. This is fine when a user is disciplined - always using virtualenvs, and only installing what they need. But for users who install everything in the global environment, things soon get messy.

## Non-Python example - Vue JS

One of my favourite projects in any language is [Vue JS](https://vuejs.org/), a UI framework for the web.

The main reason for this is it's approach to extensibility. The main Vue JS package contains the core functionality. It has a well defined plugin system, so people can extend it as they want. But they offer official packages for the most common requirements - Vuex (data management) and Vue Router (routing).

This means that people who are new to the project aren't trauling through hundreds of extensions of mixed quality trying to find something that works. The official extensions provide sensible defaults which function well. The problem is particularly acute in Javascript, which has an overwhelming number of options when it comes to packages. Users can then swap in their own alternatives as they discover more about their project's particular requirements.

The cherry on top is another official project called [Vue CLI](https://cli.vuejs.org/). It's a tool which guides a user through setting up a project, presenting the various extensions which can be installed, and scaffolds a project in which they work together.

## What's ideal?

I think if Django started again from scratch, it would have been broken down into multiple projects. Many of the Django components are heavily interdependent, which makes it hard to swap out say the ORM for an alternative.

By having smaller components, it also makes them feel more manageable from a maintenence perspective. They are also more modular, meaning they can be reused by other projects.

For Piccolo, the plan is:

 * Move functionality into extensions where appropriate
 * Provide a low friction way of adding extensions (like in Pytest - just pip install)
 * Provide a CLI tool which orchestrates the various extensions (like Vue)
 * Configure extensions using a optional configuration file

## What sorts of extension are envisaged for Piccolo?

The scope for extensions is large.

 * A web based admin interface
 * Migrations
 * Auth
 * ASGI middleware
 * Auto REST APIs

And hopefully many more.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Why Python type annotations are awesome]]></title>
            <link>https://piccolo-orm.com/blog/why-python-type-annotations-are-awesome/</link>
            <guid>https://piccolo-orm.com/blog/why-python-type-annotations-are-awesome/</guid>
            <pubDate>Tue, 15 Jan 2019 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
For me, the two stand out features of Python 3 are [asyncio](https://docs.python.org/3/library/asyncio.html) and [type annotations](https://docs.python.org/3/library/typing.html).

I think both were essential for keeping Python competitive as a language for building backend systems.

In this article I'll talk about why I think type annotations in particular are awesome, and why Piccolo uses them so heavily.

## Documentation

The first reason to use type annotations is to document your code. Taking this very simple example:

```python
def email_user(user):
    # some code
```

User could be any number of different things - an integer, a User object, a string ... the list does on. It adds cognitive load to someone new to a project, having to work out what's going on.

Before type annotations, this was solved using specially formatted docstrings.

```python
def email_user(user):
    """
    :type user: User
    """
    # some code
```

This is better, but having the type annotation in a string is limiting. Using the Python 3 approach:

```python
def email_user(user: User):
    # some code
```

The advantage here is with code editors like Visual Studio Code you can now `Command + click` on the User annotation. and it'll take you to the definition of User in your project, which is a great usability improvement over doing a manual search.

Declaring your type annotations here makes them available in an `__annotations__` property, which you access using `typing.get_type_hints`.

```python
from typing import get_type_hints

def email_user(user: User):
    # some code

get_type_hints(email_user)
>>> {'user': User}
```

This makes the annotations easier to access than parsing a docstring, and allows for some interesting applications.

## Mypy

[Mypy](http://mypy-lang.org/) uses the type annotations to analyse your code for errors.

```python
def say_hello(name: str):
    print(name)


say_hello(1)  # Error!
```

Visual Studio Code supports it out of the box. Combined with a linter like Flake8, your editing experience is super charged - catching most coding errors you're likely to encounter.

Having type checks provides you with an extra level of confidence that your code is working as expected. This is especially useful when refactoring large projects.

## Progressive enhancement

One criticism you sometimes here is why not just use a statically typed language?

What's nice about MyPy (and also it's companion in the Javascript world - Typescript), is you can add type annotations incrementally. Creating a quick and dirty prototype? Leave the annotations out for now.

A library can use type annotations (like Piccolo), and the user doesn't need to care - they can use Python as they always have. But the library author has that extra level of confidence that their code works as expected.

## Advanced examples

To finish off, here are some examples of the interesting things you can do with type annotations in Python.

```python
import typing as t  # Importing it as an alias makes it less verbose


# You can assign type annotations to variables:
Pet = t.Union[Dog, Cat, Hamster]


# pet can be a Dog, Cat, or Hamster
def say_name(pet: Pet):
    print(pet.name)


# license_number can be None or an int
def create_driver(name: str, license_number: t.Optional[int] = None):
    print(f'Creating {name} with license {license_number}')


class Dog():
    # In Python 3.7 forward references are allowed i.e. the
    # return type can be the current class being defined.
    def return_friend(self) -> Dog:
        return some_dog


# If you want to return a type defined in another file, and
# are only importing it for use as a type annotation, you
# can do this:
if t.TYPE_CHECKING:
    import Budgie from animals


# type annotations can also be used on variables
budgies: t.List[Budgie] = []

```

As you can see, the typing module is already very powerful - give it a go!
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Introduction to ASGI]]></title>
            <link>https://piccolo-orm.com/blog/introduction-to-asgi/</link>
            <guid>https://piccolo-orm.com/blog/introduction-to-asgi/</guid>
            <pubDate>Fri, 07 Dec 2018 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
***NOTE: UPDATED FOR ASGI 3.0***

In order to make full use of Piccolo in a web application, you'll need to use it with an async routing framework.

There are a few options, including [Sanic](https://github.com/huge-success/sanic), [Quart](https://gitlab.com/pgjones/quart), and [Starlette](https://github.com/encode/starlette).

As the number of frameworks grows, there's an increasing need for some level of standardisation, allowing different components to work together in clearly defined ways.

With synchronous frameworks, a community standard called [WSGI](https://www.python.org/dev/peps/pep-3333/) specifies how an application talks to a web server. This meant you could combine any WSGI web framework, with any WSGI web server, using any WSGI middleware.

WSGI doesn't work for async frameworks though, because it ties a single request to a single response. For async applications, such a web sockets, a single request can result in multiple responses over time.

To solve this problem, [ASGI](https://asgi.readthedocs.io/en/latest/) (Asynchronous Server Gateway Interface) was proposed.

## ASGI App

An ASGI application is just a callable, which accepts three arguments - `scope`, `receive`, and `send`.

```python
class ASGIApp():

    async def __call__(self, scope, receive, send):
        message = await receive()
        await send({
            "type": "http.response.start",
            "status": 200,
            "headers": []
        })
        await send({
            'type': 'http.response.body',
            'body': bytes('hello world', 'utf-8')
        })

app = ASGIApp()
```

The scope argument tells the ASGI app about the connection. For a HTTP connection, this will include things like headers, the path, query parameters etc.

The receive and send arguments are how the ASGI app receives/sends data.

Declaring your ASGI app as a class allows you to configure it using a constructor. If you don't need to configure your app, you can declare it as a function instead.

```python
async def app(scope, receive, send):
    message = await receive()
    await send({
        "type": "http.response.start",
        "status": 200,
        "headers": []
    })
    await send({
        'type': 'http.response.body',
        'body': bytes('hello world', 'utf-8')
    })
```

## ASGI Middleware

Middleware modifies the scope passed to ASGI apps, or can do things like return a 403 error if no auth token is provided.

```python
class ASGIMiddleware():
    def __init__(self, asgi_app):
        self.asgi_app = asgi_app

    async def __call__(self, scope, send, receive):
        # We have to copy the scope before modifying it to prevent changes
        # from leaking upstream:
        new_scope = dict(scope)
        new_scope['some_param'] = True
        await self.asgi_app(new_scope, send, receive)

app = ASGIMiddleware(ASGIApp)

```

## ASGI all the way down

What's interesting about an ASGI application is every component of that app is also ASGI. Routing is ASGI, middleware is ASGI, views are ASGI. Want to embed another ASGI app, built with a totally different framework, within your ASGI app? No problem.

With WSGI, frameworks often didn't achieve this level of modularity / composability. For example, Django views and middleware aren't WSGI - only the top level app is.

## ASGI servers

There are already three great ASGI servers - [Uvicorn](https://github.com/encode/uvicorn), [Hypercorn](https://gitlab.com/pgjones/hypercorn), and [Daphne](https://github.com/django/daphne).

Any of them will do fine. In my own testing, I got marginally better performance out of Hypercorn, though this could change over time.

Hypercorn makes a great development server, because it can automatically reload the server when it detects changes to your application (in the same was the Django dev server does).

```bash
hypercorn --uvloop --reload --b localhost:8000 views:app
```

## ASGI frameworks

Quart and Starlette already support ASGI.

Sanic can now run under a ASGI server.

Django Channels is another ASGI framework, which brings asynchronous capabilities (web sockets, HTTP2) to Django. The author of Django Channels, Andrew Godwin, was also the author of the ASGI spec.

### Which one should I use?

Quart seeks to be compatible with Flask, a popular WSGI framework. If this is important to you, then it's a sensible choice. The API will be familiar, meaning you don't have to relearn concepts, and many Flask extensions will also still work.

Django Channels is perfect if you want to add some async to a Django project.

Starlette is my current favourite for new projects which don't require Django or Flask interoperability. It also feels the most like a pure ASGI framework. Every component is ASGI, so it delivers on the promise of composability and modularity that I find so appealing. It can be used as a framework in its own right, or you can use it as a source of building blocks, and build your own framework on top of it ([Responder](https://github.com/kennethreitz/responder) is one example).

## Conclusions

ASGI is an important pillar in the world of async Python. I'll show some examples in the future incorporating Piccolo with an ASGI framework.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[asyncio vs gevent]]></title>
            <link>https://piccolo-orm.com/blog/asyncio-vs-gevent/</link>
            <guid>https://piccolo-orm.com/blog/asyncio-vs-gevent/</guid>
            <pubDate>Thu, 29 Nov 2018 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
An alternative to asyncio is [gevent](http://www.gevent.org/) (and a similar library called [eventlet](http://eventlet.net/)).

Gevent also uses an event loop, but it's hidden from the user. Your code is run in greenlets, which are similar to threads but are scheduled by Python and not the operating system. The Python socket library is patched, so whenever your program is blocked on a network request it'll automatically switch to another greenlet, and run that instead.

The main benefit of gevent is you can make a traditional synchronous program work asyncronously with little effort. For example, a [Django](https://djangoproject.com) app can be run using [Gunicorn](http://docs.gunicorn.org/en/stable/), a popular WSGI server, which [supports gevent out of the box](http://docs.gunicorn.org/en/stable/settings.html). If your application was previously IO bound, you can expect to see increased throughput.

However, some care is required. Some libraries don't play nicely with the patched socket library, so testing is advised before pushing to production.

It also depends on your preference for implicit vs explicit code. With asyncio, you'll have a bunch of async / await statements, but it makes it clearer when context switches are happening, which does align with the [Zen of Python](https://www.python.org/dev/peps/pep-0020/) - 'Explicit is better than implicit'.

A lot of languages have either adoped async/await (C#, Javascript, Kotlin), or are about to (Swift). By comparison, very few languages support the implicit concurrency model of gevent. This doesn't mean gevent is wrong, but async/await is certainly part of the modern zeitgeist of language design.

In conclusion, both asyncio and gevent are great options for IO bound applications. Part of the strength of Python is it has multiple solutions to various problems. Which to use depends on your personal preferences, and whether you're starting on a brand new project or not.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Is async worthwhile?]]></title>
            <link>https://piccolo-orm.com/blog/is-async-worthwhile/</link>
            <guid>https://piccolo-orm.com/blog/is-async-worthwhile/</guid>
            <pubDate>Sat, 03 Nov 2018 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
One of the main motivations for building Piccolo was the lack of options for an asyncio ORM.

But is async worthwhile? This is what this article will explore.

## What is asyncio?

asyncio is a library added in Python 3, to provide an event loop implementation in the standard library.

Prior to this, each framework that implemented non-blocking IO via an event loop had their own event loop implementation, limiting interoperability (Twisted and Tornado being by far the most well known).

## Does my website need asyncio?

Most small websites can deliver a perfectly acceptable user experience without using async.

However, for building websites or APIs at scale, there are real benefits to using non-blocking IO.

Asyncio will help improve the throughput of a Python application. This means that a given server can handle more traffic, which can result in real cost savings.

## Is asyncio all about speed?

Non-blocking IO won't make your website faster when under small load. For example, when only dealing with one sequential request at a time.

However, it does improve throughput, so under high load a user's request will be queued for less time, and they'll receive a response faster.

An interesting side effect of asyncio is it got library authors thinking about performance. By using efficient HTTP parsing, and Cython-ising slow parts, many asyncio libraries are actually faster than synchronous alternatives, but this isn't due to asyncio itself.

## How much time does Python spend waiting on a database?

Even a simple database operation takes in the order of milliseconds (10^-3) to execute. Even though databases are highly optimised, they still have to parse and execute the SQL, which involves interacting with the disk.

In addition, there is also network lag when talking to a remote database, and overheads such as authentication and encryption.

Python isn't a fast language, but basic Python operations take in the order of microseconds (10^-6).

So there is time for Python to do meaningful work when waiting for a database response. The question becomes how much?

This is dependent on the overhead that asyncio imposes. If the asyncio event loop, and associated Python code required to schedule coroutines, is slow then it'll defeat the purpose.

Libraries such as uvloop are important in this regard, since they offer a faster event loop implemention, which is still compatible with asyncio.

## Conclusions

The performance benefits of asyncio are real. The asyncio ecosystem is maturing fast, and for websites where performance is critical, asyncio makes a lot of sense.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Why is an event loop useful?]]></title>
            <link>https://piccolo-orm.com/blog/why-is-an-event-loop-useful/</link>
            <guid>https://piccolo-orm.com/blog/why-is-an-event-loop-useful/</guid>
            <pubDate>Wed, 10 Oct 2018 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
Traditionally, each unit of work which needs to operate concurrently would be assigned to a separate process or thread. Threads and processes are operating system constructs, and are expensive to create. It's up to the operating system when it schedules them to run, not the program. If a program requires thousands of threads, the constant switching between them can result in poor system performance.

An alternative is to use an event loop, which operates in a single thread. Each task which needs to operate concurrently is registered with the event loop. When one task blocks, it yields control back to the event loop, which will resume another task.

One of the better known programs using an event loop is Nginx, which was originally a proxy, but is now a general purpose web server. By using an event loop it was able to provide breakthrough levels of performance when it first appeared on the scene - being able to serve thousands of web requests concurrently. It contrasted to traditional server architectures at the time, as typified by the Apache web server, which created a thread or process per connection.

In order for an event loop to work, you need to be able to suspend tasks while they're blocked on IO. In Python, this is possible due to generators. Generators have existed in Python for a long time, and conveniently are functions which can be suspended.

```python
def counter():
    i = 0
    while True:
        yield i
        i += 1

_counter = counter()
_counter.__next__()
>>> 0
_counter.__next__()
>>> 1
_counter.__next__()
>>> 2

```

In early versions of asyncio, generators were used directly. Now the async and await keywords are used instead, but the underlying mechanisms are the same.

As well as performance advantages, an event loop also provides some nice abstractions which makes lives easier for developers. In the case of asyncio, you don't have to worry about sockets - they're astracted away. Likewise, you don't have to worry about how a task gets scheduled, the event loop takes care of it too.

One of my favourite features that asyncio provides is the gather function:

```python
import asyncio

async def hello(name):
    # This would usually involve some IO - to a db or something.
    print(f'hello {name}')

async def hello_everyone():
    await asyncio.gather(
        hello('bob'),
        hello('sally'),
        hello('fred')
    )
    print("welcome!")

asyncio.run(hello_everyone())
>>> hello bob
>>> hello sally
>>> hello fred
>>> welcome!

```

With asyncio.gather it makes it very easy to wait until a bunch of tasks have all finished. It's an example of the sorts of nice features which can be built on top of the event loop abstraction.

And last but not least, event loops make a lot of sense in Python due to the Global Interpretter Lock (GIL), which limits the effectiveness of multi-threaded programs. This makes event loops, which provides concurrency using a single thread, more attractive.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Should I use Python instead of Golang or Node?]]></title>
            <link>https://piccolo-orm.com/blog/should-i-use-python-instead-of-golang-or-node/</link>
            <guid>https://piccolo-orm.com/blog/should-i-use-python-instead-of-golang-or-node/</guid>
            <pubDate>Fri, 05 Oct 2018 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
In recent years the go-to solution for high performance web services has been Node JS or Golang.

Both of them have strong concurrency support. In the case of Node, it used to be callbacks, and is now promises with async / await, on top of a single threaded event loop. With Golang it's goroutines, which are automatically scheduled onto threads by the Golang runtime.

With asyncio, Python now has a compelling concurrency story in the standard library, which makes it worthy of consideration for new web projects.

The comparisons are based on my own experiences.

## Node pros and cons

### + Quick prototyping

With Node, you can build a web service very quickly using Express.

### + Performance

The performance of Node JS is remarkable, due to the effort that has gone into the V8 Javascript engine. It uses JIT compilation, which means that any 'hot paths' are compiled to machine code. Hot paths are chunks of code which are run repeatedly with the same arguments, meaning the engine can infer the correct types and compile them.

In my own testing, I can get about 3000 requests per second out of a simple Express service, which is staggering for an interpretted language.

### - Typing support

When building large Javascript projects, it's getting increasingly common to use a language like Typescript, which provides typing support and transpiles to Javascript.

Typing supports helps make code more understandable, and testable.

With Python, there's now standard library support via the typing module. This makes it really easy to implement.

It soon gets tedious with Typescript constantly having to transpile it to Javascript (unless you run it through ts-node). Everything requires a build step. Have unit tests which include Typescript code? Need to transpile them first before running.

### - Testing

There's no standard testing framework built into Node JS, meaning you have to rely on third party alternatives. Libraries such as Jasmine and Karma are good, but it still feels slightly inferior and harder work than in Python.

### - Library overload

The greatest strength and weakness of Javascript is the abundance of libraries. The options can be bewildering when you're getting started.

## Golang pros and cons

### + Performance

Golang is a compiled language, so as expected it provides greater performance than either Node or Python.

In my own testing, I can get about 4000 requests per second out of a simple Golang web service, built using Gin.

### + Typing support

Golang has builtin supports for type annotations.

### - Limited language

When coming from a more feature rich language, Golang can feel both liberating and incredibly constraining.

Golang compiles very quickly. It feels like compromises were made to enable this, namely limiting the number of features in the language.

In Golang, there are no classes - only structs. It would be nice to have the choice, like in a language like Swift, where the programmer gets to choose. Classes make more sense for some use cases, and structs do in others.

The limitations of the language also make it trickier for library authors. In Python and Swift you can overload operators and such, creating syntax which feels more natural.

One common complaint about Golang is it has no support for generics. This is likely to change soon though.

### - No nested folders

When writing a package, all of the files and tests are in a single folder, without subfolders.

Some people might like this, but I find it limiting and quite annoying.

### - Package management

Package management is an ongoing saga.

## Python Asyncio Pros and Cons

### - Performance

Relative to Node and Golang, you will get worse performance with Python.

However, the differences aren't as large when using asyncio. Uvicorn + Starlette, or Sanic, can get you to over 1000 requests per second. With synchronous Python frameworks like Flask and Django, you'll get around 300 requests per second.

### + Error tracking

One place where Python absolutely shines is in error handling. In Golang, errors are just strings. With Python, you can throw exceptions, and get a nice stack trace. It makes debugging Python web services very simple, especially when used in conjunction with a logging service like Sentry. It's easy to take this stuff for granted, but it's a huge win.

### + Typing support

Python 3 added the typing module. This allows you to add type annotations to your Python code. It doesn't result in higher performance code, but it does make large projects easier to maintain. In conjunction with mypy you get many of the benefits of a statically compiled language.

### + Everything you love about Python

There's a lot to love about Python. List comprehensions, exploring ideas in the interpretter, ease of learning for new programmers etc.

## Conclusions

The aim of this article isn't to suggest that Python is always the best solution.

However, with recent changes to the language, it is more competitive than ever. The performance gap has been lessened, and what you lose in performance you gain in terms of usability and programmer productivity.

In particular, if you're someone who already knows Python, take another look before jumping ship.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Why choose Piccolo?]]></title>
            <link>https://piccolo-orm.com/blog/why-choose-piccolo/</link>
            <guid>https://piccolo-orm.com/blog/why-choose-piccolo/</guid>
            <pubDate>Mon, 01 Oct 2018 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
This is a quick overview of the most popular Python ORMs, along with their strengths and weaknesses, and why you might want to choose Piccolo for your project.

## SQLAlchemy

Advantages:

 * The swiss army knife of ORMs - supports a lot of database features.
 * Automatic migrations
 * Standalone

Downsides:

 * Steep learning curve

## Django

Advantages:

 * Automatic migrations
 * Easy to learn the basics, but surprising depth
 * Integration with testing framework
 * Admin integration

Downsides:

 * Tricky to use it standalone.
 * Some unintuitive syntax, such as group by.

## Peewee

Advantages:

 * Simple
 * Standalone

Downsides:

 * Limited migration support (not automatic)

## Piccolo

The main reason you'd pick Piccolo is if you need asyncio support. The Django ORM is the only one which [might support this](https://www.aeracode.org/2018/06/04/django-async-roadmap/) in the future, but traditionally it has been hard to use the Django ORM in a standalone project.

Piccolo prioritises ease of use over supporting a large number of databases and features. It attempts to cover 90% of queries you're likely to do on a database, and encourages you to drop down to SQL when required.

The syntax attempts to be as close to SQL as possible. This lessens the learning curve for people with SQL experience, and means they won't have to learn a bunch of new abstractions on top of something they're already familiar with.

An ORM by itself isn't sufficient, so the following batteries are included:

* Migration support
* A user model
* Test runner

Piccolo also supports the [asyncpg](https://github.com/MagicStack/asyncpg) database driver, which is exceptionally fast.
]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Reasons to use an ORM]]></title>
            <link>https://piccolo-orm.com/blog/reasons-to-use-an-orm/</link>
            <guid>https://piccolo-orm.com/blog/reasons-to-use-an-orm/</guid>
            <pubDate>Mon, 01 Oct 2018 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[
## Benefits of an ORM

### Convenience

A good ORM should make a developer's life easier. It should take care of the tedious things, like escaping values. An ORM can also have a more compact syntax than SQL. This is most obvious with joins.

```python
query = Band.select(Band.name, Band.genre.name)

query.__str__()

SELECT name, genre.name FROM band JOIN genre ON band.genre = genre.id

```

### Batteries included

One of the most useful things an ORM comes bundled with is migrations. Handling migrations manually can be tedious and error prone.

ORMs often include other tools and features which make a developer's life easier. Examples include test runners, data fixtures etc.

### Passing around partial queries

With Piccolo, you can pass around queries, and keep on chaining methods onto it.

```python
query = Band.select(Band.name)

if rock == True:
    query = query.where(Band.genre == 'rock')

results = await query.run()

```

Doing this with raw SQL strings quickly becomes unmanageable.

## Downsides of an ORM

ORMs aren't without their issues.

### Performance

When you use an ORM, there is inevitable extra overhead in generating the SQL.

You'll sometimes hear people complain that an ORM generates inefficient SQL. This is usually only for very complex queries.

There's nothing wrong with writing raw SQL, but with Piccolo it means you don't have to write it 90% of the time.

Piccolo also makes it easy to see the SQL being executed - just print any query.

### Can be tedious to learn

A lot of ORMs have their own terminology which doesn't match closely to SQL.

Over time, learning an ORM can feel tedious - you know SQL, but you're having to re-learn concepts over and over for each ORM you use.

### Some database features aren't available

The Piccolo ORM covers the most common interactions an app will have to make with a database.

In cases where it's not possible, you can just drop down into raw SQL.

Trying to encapsulate every possible database features within an ORM is very challenging, and can lead to an unmanageable code base.
]]></content:encoded>
        </item>
    </channel>
</rss>