[placeholder]

Building On Solid Ground: Getting Postgres Foundations Right With pgbedrock

Check out the source code!

Every day here at Squarespace, data scientists are building customer behavior models, financial analysts are evaluating company performance, and product managers are investigating the results of new feature launches. Critical to the success of each of those groups is the Data Engineering team, which processes and makes accessible to the organization all business-critical data.

Most of this data is stored within a large, vertically scaled Postgres instance. As Squarespace has grown, more and more users throughout the company have begun accessing data from this Postgres instance. And as the number of users grew our team found it harder and harder to manage the diverse access requirements that each group requires: marketers shouldn't be able to look at highly confidential financial numbers, analysts shouldn't be able to modify programmatically processed data sets, and service accounts should be scoped to only what they actually need access to. However, we were struggling to uphold those guarantees. We would regularly get asked to grant access to a table only to be bewildered how the user didn't already have access. And when a new user had to be created (e.g. Rachel joins Marketing), the typical approach was to find a similar user (Jon in Marketing) and try our best to copy the same privileges over.

Our ad hoc approach wasn't working. We couldn't be sure who had access to what or even how to fix it. Several data engineers remarked that it felt like death by a thousand papercuts as every other day would involve manually tweaking user privileges.

The Underlying Causes

We quickly realized that although Postgres permissions were more than flexible enough to handle all of our needs, they were also so complex that we couldn't understand the system anymore.

Part of this complexity stems from Postgres's privileges themselves. First, privilege types can be inconsistent across objects. For example, read-level access to a schema is called USAGE but for a table it is SELECT (and to access data in a table you need both!). In addition, there are often subtleties in what a given privilege means. As an example, I have yet to find someone who remembers off the top of their head the difference between granting SELECT vs. USAGE for a sequence. And in a similar vein, some of these subtleties are so unexpected that they become gotchas. Default privileges are a great example of this. It is very common for someone to expect that running GRANT SELECT ON ALL TABLES IN SCHEMA foo TO jdoe will apply to tables created in the future; in truth a default privilege grant is required. But even when people do know about default privileges, often they don't realize that default privileges are granted only to objects created by the user that ran the default privilege grant. All of this is to say: privileges in Postgres can be confusing.

In addition to complexity with Postgres's privileges themselves, the way that configuration is distributed throughout the database exacerbates the confusion. To see user configuration one needs to look in pg_authid. For membership information check pg_auth_members. Existing privileges are in each object's relevant catalog table (for example, the pg_class.relacl column for tables), yet default privileges are in pg_default_acl. The result of this is that sometimes it is unclear where to find information needed to debug an issue. But even if one knows where to find that information, the information is so distributed that as the system grows it becomes nearly impossible to maintain a high-level lay of the land. As a result, when things don't work as expected it can feel like trying to find a needle in a haystack.

Lastly, because Postgres is used to manage its own configuration, a superuser can make changes live without any peer review. This leads to changes that no one is aware of, that may be incorrect, and that can cause staging and production instances to get out of sync.

Our Goals in Addressing This

We knew that if we wanted to continue scaling our use of Postgres we would need to address the above issues. Specifically, we set out to build a tool that would accomplish the following goals:

  1. Co-locate all configuration within one file so that we can easily get a "lay of the land."
  2. Simplify permission complexity down to read vs. write per object as most users (including engineers) aren't familiar enough with Postgres's permissioning to avoid the gotchas outlined above. In addition, despite our large, relatively complex use case we had no need for a greater distinction than read vs. write access on objects. We assume that the same holds true for many other Postgres use cases as well.
  3. Assert that our configuration matches reality by having our tool compare our configuration file against our database and, if there are mismatches, make changes to the database to bring the two in line.

Implementation

Once we had nailed down the high-level goals for this project, we set to work. After verifying that no existing tool met our needs, we began work to create our own. We named it pgbedrock, as this tool would manage the foundations of a database: the roles, role memberships, schema existence and ownership, and object privileges. For the format of the configuration file itself we chose YAML due to how easy it is for humans to read and write as well as the ease of parsing it programmatically. As most of our team is quite comfortable with Python, we chose to use Python3 for the application itself, leveraging a minimal set of libraries to meet our needs: psycopg2 for interacting with Postgres itself, click for building the command-line interface that the tool would provide, and cerberus for validating that a provided configuration file is properly defined.

Within the configuration file, each role definition looks something like this, with every field being optional:

jdoe:
    can_login: yes
    is_superuser: no
    member_of:
        - analyst
    attributes:
        - PASSWORD "{{ env['JDOE_PASSWORD'] }}"
    owns:
        schemas:
            - finance_reports
    privileges:
        schemas:
            read:
                - finance
                - marketing
            write:
                - reports
        tables:
            read:
                - finance.*
                - marketing.ad_spend
                - marketing.impressions
            write:
                - reports.*
        sequences:
            write:
                - reports.*

When pgbedrock is run, it makes sure that:

  • The role jdoe exists
  • jdoe can log in
  • jdoe's password is the same as what is in the $JDOE_PASSWORD environment variable
  • All other role attributes for jdoe are the Postgres defaults (as defined by pg_authid).
  • jdoe is a member of the analyst role
  • jdoe is a member of no other roles
  • jdoe owns the finance_reports schema
  • jdoe has read-level schema access (in Postgres terms: USAGE) for the finance and marketing schemas
  • jdoe has write-level schema access (CREATE) for the reports schema
  • jdoe has read-level access (SELECT) to all tables in the finance schema and to the marketing.ad_spend and marketing.impressions tables
  • jdoe has read-level access to all future tables in the finance schema (i.e. a default privilege)
  • jdoe has write-level access (SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, and TRIGGER) to all tables in the reports schema
  • jdoe has write-level access to all future tables in the reports schema (i.e. a default privilege)
  • jdoe has write-level access (SELECT, USAGE, UPDATE) to all sequences in the reports schema
  • jdoe has write-level access to all future sequences in the reports schema (i.e. a default privilege)

With a configuration file fully built out, we can then configure our database to match this with one command:

pgbedrock configure /path/to/config.yml \
    -h myhost.mynetwork.net \
    -p 5432 \
    -d mydatabase \
    -U mysuperuser \
    --prompt \
    --check

Alternatively, we publish a docker image that can be used instead so users do not have to fiddle with setting up a Python environment.

Lastly, to enable other Postgres teams to begin using this tool, we also offer a way to programmatically generate a configuration file:

pgbedrock generate \
    -h myhost.mynetwork.net \
    -p 5432 \
    -d mydatabase \
    -U mysuperuser \
    --prompt > /path/to/config.yml

pgbedrock makes some simplifications (for example, simplifying access down to read-level vs. write-level), so the generated configuration may not perfectly match your database's settings. Running pgbedrock configure on the new configuration file using check mode will show any places where discrepancies exist.

Impact

After extensive testing of our application using pytest and docker, we were ready to put it into use with our production Postgres instance. The first run of pgbedrock (in check mode, so that no real changes would occur) showed 33,000 lines of proposed SQL changes. While at first this seems shocking, it quickly made sense: previously every member of a group typically granted every other member of that group access to their objects. This meant that if we had 10 analysts and 100 objects, then we would have 10x10x100 = 10,000 individual privilege grants. Pgbedrock allowed us to instead efficiently grant all these permissions at the group level (in this case, granted to the "analyst" role itself). On our first run, pgbedrock removed all these individual-level privileges (which were often inconsistently applied to users, causing a portion of the "death by a thousand papercuts" mentioned above). The result of this one set of changes alone had a significant impact for us: adding a new analyst is now just a matter of copying an easy-to-follow 5-line definition:

jdoe:
    can_login: yes
    has_personal_schema: yes
    member_of:
        - analyst

Moreover, we can now feel confident that all analysts have the same permissions. That first run also uncovered a variety of roles that had privileges they should not, including service accounts with write-level access to objects they had no reason to access.

Finally, pgbedrock has fundamentally changed our database administration process. Previously, making a change involved putting in a ticket asking an engineer with superuser access to do something. The person with the superuser access then connected to the database and manually made whatever changes they thought were necessary, changes which were often incomplete or inconsistently applied. Now, any user can make sense of our configuration file and put in a pull request with the changes they want. Those changes get reviewed by our engineers and upon merging pgbedrock makes the changes live through a continuous integration process. This makes it easier and faster for our users to get their problems fixed and frees up our engineers to focus on more interesting work. Additionally, if someone with superuser access makes changes directly within our Postgres instance and does not add those changes to our configuration file, then those changes will be undone the next time pgbedrock runs. This encourages our engineers to follow the same process as other users: commit the proposed changes, get peer review, and merge to master to make changes live.

Conclusion

Pgbedrock has improved the administration, privilege auditing, and end-user experience of our Postgres database. By investing in a long-term solution to this issue, we have freed up engineers to work on more interesting problems, improved our auditability, and enabled more self-service at the same time.

Try it out! You can install pgbedrock through pip, download the docker image from quay.io/squarespace/pgbedrock, or check out the source code at github.com/Squarespace/pgbedrock!

If you're interested in solving data- and database-related problems like this, our Engineering team is hiring!

Continuous Localization at Squarespace