# Working with Related Objects Using ORM


In this chapter, we will cover another essential ORM concept, which is the interaction with mapped objects that reference other objects.

relationship() defines the relationship between two mapped objects and is also known as self-referencing.

For simplicity, we will omit Column mappings and other directives, and explain relationship() in a shortened form.


from sqlalchemy.orm import relationship


class User(Base):
    __tablename__ = 'user_account'

    # ... Column mappings

    addresses = relationship("Address", back_populates="user")


class Address(Base):
    __tablename__ = 'address'

    # ... Column mappings

    user = relationship("User", back_populates="addresses")

In the structure shown, the User object has a variable addresses, and the Address object has a variable user.

Both are created as relationship objects, but these aren't actual database columns but are set up to allow easy access in the code.

In other words, it facilitates easy navigation from a User object to an Address object.

Additionally, the back_populates parameter in the relationship declaration allows for the reverse situation, i.e., navigating from an Address object to a User object.

In relational Database terms, it naturally sets a 1 : N relationship as an N : 1 relationship.

In the next section, we will see what role the relationship() object's instances play and how they function.



When a new User object is created, the .addresses collection appears as a List object.

>>> u1 = User(name='pkrabs', fullname='Pearl Krabs')    
>>> u1.addresses
[]

You can add an Address object using list.append().

>>> a1 = Address(email_address="pear1.krabs@gmail.com")
>>> u1.addresses.append(a1)

# The u1.addresses collection now includes the new Address object.
>>> u1.addresses
[Address(id=None, email_address='pearl.krabs@gmail.com')]

If an Address object is associated with the User.addresses collection, another action occurs in the variable u1. The User.addresses and Address.user relationship is synchronized, allowing you to move: - From a User object to an Address, and - Back from an Address object to a User.

>>> a1.user
User(id=None, name='pkrabs', fullname='Pearl Krabs')

This is the result of synchronization using relationship.back_populates between the two relationship() objects.

The relationship() parameter can be complementarily assigned/list modified to another variable. Creating another Address object and assigning it to the Address.user property makes it part of the User.addresses collection.

>>> a2 = Address(email_address="pearl@aol.com", user=u1)
>>> u1.addresses
[Address(id=None, email_address='pearl.krabs@gmail.com'), Address(id=None, email_address='pearl@aol.com')]

We actually used the variable u1 as a keyword argument for user as if it were a property declared in the object (Address). It's equivalent to assigning the property afterward.

# equivalent effect as a2 = Address(user=u1)
>>> a2.user = u1

# Cascading Objects in the Session


We now have two related User and Address objects in a bidirectional structure in memory, but as mentioned earlier in Inserting Rows with ORM , these objects are in a transient state in the Session until they are associated with it.

We need to see when using Session.add(), and applying the method to the User object, that the related Address objects are also added.

>>> session.add(u1)
>>> u1 in session
True
>>> a1 in session
True
>>> a2 in session 
True

The three objects are now in a pending state, which means no INSERT operations have been executed yet. The three objects have not been assigned primary keys, and the a1 and a2 objects have a column (user_id) reference property. This is because the objects are not yet actually connected to a real database.

>>> print(u1.id)
None
>>> print(a1.user_id)
None

Let's save it to the database.

>>> session.commit()

If we translate the implemented code into SQL queries, it would look like this.

INSERT INTO user_account (name, fullname) VALUES (?, ?)
[...] ('pkrabs', 'Pearl Krabs')
INSERT INTO address (email_address, user_id) VALUES (?, ?)
[...] ('pearl.krabs@gmail.com', 6)
INSERT INTO address (email_address, user_id) VALUES (?, ?)
[...] ('pearl@aol.com', 6)
COMMIT

Using session, you can automate SQL INSERT, UPDATE, DELETE statements.

Finally, executing Session.commit() ensures all steps are called in the correct order, and the primary key of address.user_id is applied in the user_account.


# Loading Relationships


After calling Session.commit(), you can see the primary key created for the u1 object.

>>> u1.id
6

The above code is equivalent to executing the following query.

BEGIN (implicit)
SELECT user_account.id AS user_account_id, user_account.name AS user_account_name,
user_account.fullname AS user_account_fullname
FROM user_account
WHERE user_account.id = ?
[...] (6,)

You can also see that ids are now present in the objects linked to u1.addresses.

To retrieve these objects, we can observe the lazy load approach.

lazy loading : This is a method where a SELECT statement is executed to fetch information only when someone tries to access that information. In other words, it retrieves the necessary information as needed.

>>> u1.addresses
[Address(id=4, email_address='pearl.krabs@gmail.com'), Address(id=5, email_address='pearl@aol.com')]
SELECT address.id AS address_id, address.email_address AS address_email_address,
address.user_id AS address_user_id
FROM address
WHERE ? = address.user_id
[...] (6,)

SQLAlchemy ORM’s default for collections and related properties is lazy loading. This means once a collection has been relationshipped, as long as the data exists in memory, it remains accessible.

>>> u1.addresses
[Address(id=4, email_address='pearl.krabs@gmail.com'), Address(id=5, email_address='pearl@aol.com')]

Although lazy loading can be costly without explicit steps for optimization, it is optimized at least not to perform redundant operations.

You can also see the a1 and a2 objects in the u1.addresses collection.

>>> a1
Address(id=4, email_address='pearl.krabs@gmail.com')
>>> a2
Address(id=5, email_address='pearl@aol.com')

We will provide a further introduction to the concept of relationship in the latter part of this section.


# Using relationship in Queries


This section introduces several ways in which relationship() helps automate SQL query construction.


# JOIN tables using relationship()

In Specifying the FROM and JOIN Clauses and WHERE Clauses sections, we used Select.join() and Select.join_from() methods to construct SQL JOINs. These methods infer the ON clause based on whether there's a ForeignKeyConstraint object linking the two tables or provide specific SQL Expression syntax representing the ON clause.

relationship() objects can be used to set the ON clause for joins. A relationship() corresponding object can be passed as a single argument to Select.join(), serving as both the right join and the ON clause.

>>> print(
...     select(Address.email_address).
...     select_from(User).
...     join(User.addresses)
... )

The above code is equivalent to executing the following query.

SELECT address.email_address
FROM user_account JOIN address ON user_account.id = address.user_id

If relationship() is not specified in Select.join() or Select.join_from(), no ON clause is used. This means it functions due to the ForeignKeyConstraint between the two mapped table objects, not because of the relationship() object of User and Address.

>>> print(
...    select(Address.email_address).
...    join_from(User, Address)
... )

The above code is equivalent to executing the following query.

SELECT address.email_address
FROM user_account JOIN address ON user_account.id = address.user_id

# Joining Using Aliases(aliased)

When configuring SQL JOINs using relationship(), it's suitable to use [PropComparator.of_type()] with aliased() cases. However, relationship() is used to configure the same joins as described in [ORM Entity Aliases].

You can directly use aliased() in a join with relationship().

>>> from sqlalchemy.orm import aliased
>>> address_alias_1 = aliased(Address)
>>> address_alias_2 = aliased(Address)
>>> print(
...     select(User).
...     join_from(User, address_alias_1).
...     where(address_alias_1.email_address == 'patrick@aol.com').
...     join_from(User, address_alias_2).
...     where(address_alias_2.email_address == 'patrick@gmail.com')
... )

The above code is equivalent to executing the following query.

SELECT user_account.id, user_account.name, user_account.fullname
FROM user_account
JOIN address AS address_1 ON user_account.id = address_1.user_id
JOIN address AS address_2 ON user_account.id = address_2.user_id
WHERE address_1.email_address = :email_address_1
AND address_2.email_address = :email_address_2

You can use join clause in aliased() object using relationship().

>>> user_alias_1 = aliased(User)
>>> print(
...     select(user_alias_1.name).
...     join(user_alias_1.addresses)
... )

The above code is equivalent to executing the following query.

SELECT user_account_1.name
FROM user_account AS user_account_1
JOIN address ON user_account_1.id = address.user_id

# Expanding ON Conditions

You can add conditions to the ON clause created by relation(). This feature is useful not only for quickly limiting the scope of a specific join for a related path but also for use cases like loader strategy configuration introduced in the last section. PropComparator.and_() method allows a series of SQL expressions to be positionally combined in the JOIN's ON clause via AND. For example, to limit the ON criteria to specific email addresses using User and Address, you would do this.

>>> stmt = (
...   select(User.fullname).
...   join(User.addresses.and_(Address.email_address == 'pearl.krabs@gmail.com'))
... )

>>> session.execute(stmt).all()
[('Pearl Krabs',)]

The above code is equivalent to executing the following query.

SELECT user_account.fullname
FROM user_account
JOIN address ON user_account.id = address.user_id AND address.email_address = ?
[...] ('pearl.krabs@gmail.com',)

# EXISTS has() , and()

In the EXISTS Subqueries section, the SQL EXISTS keyword was introduced along with the Scalar Subqueries, Correlated Queries section. relationship() provides some help in commonly creating subqueries for relationships.


For a 1:N (one-to-many) relationship like User.addresses, you can use PropComparator.any() to create a subquery for the address table rejoining the user_account table. This method allows optional WHERE criteria to limit the rows matching the subquery.

>>> stmt = (
...   select(User.fullname).
...   where(User.addresses.any(Address.email_address == 'pearl.krabs@gmail.com'))
... )

>>> session.execute(stmt).all()
[('Pearl Krabs',)]

The above code is equivalent to executing the following query.

SELECT user_account.fullname
FROM user_account
WHERE EXISTS (SELECT 1
FROM address
WHERE user_account.id = address.user_id AND address.email_address = ?)
[...] ('pearl.krabs@gmail.com',)

Conversely, to find objects without related data, use ~User.addresses.any() to search for User objects.

>>> stmt = (
...   select(User.fullname).
...   where(~User.addresses.any())
... )

>>> session.execute(stmt).all()
[('Patrick McStar',), ('Squidward Tentacles',), ('Eugene H. Krabs',)]

The above code is equivalent to executing the following query.

SELECT user_account.fullname
FROM user_account
WHERE NOT (EXISTS (SELECT 1
FROM address
WHERE user_account.id = address.user_id))
[...] ()

PropComparator.has() works similarly to PropComparator.any() but is used for N:1 (Many-to-one) relationships. For instance, to find all Address objects belonging to "pearl", you would use this method.

>>> stmt = (
...   select(Address.email_address).
...   where(Address.user.has(User.name=="pkrabs"))
... )

>>> session.execute(stmt).all()
[('pearl.krabs@gmail.com',), ('pearl@aol.com',)]

The above code is equivalent to executing the following query.

SELECT address.email_address
FROM address
WHERE EXISTS (SELECT 1
FROM user_account
WHERE user_account.id = address.user_id AND user_account.name = ?)
[...] ('pkrabs',)

# Relationship Operators

Several types of SQL creation helpers come with relationship():

  • N : 1 (Many-to-one) comparison You can select rows where the foreign key of the target entity matches the primary key value of a specified object instance in an N:1 relationship.
>>> print(select(Address).where(Address.user == u1))

The above code is equivalent to executing the following query.

SELECT address.id, address.email_address, address.user_id
FROM address
WHERE :param_1 = address.user_id
  • NOT N : 1 (Many-to-one) comparison You can use the not equal (!=) operator.
>>> print(select(Address).where(Address.user != u1))

The above code is equivalent to executing the following query.

SELECT address.id, address.email_address, address.user_id
FROM address
WHERE address.user_id != :user_id_1 OR address.user_id IS NULL
  • You can check if an object is included in a 1:N (one-to-many) collection.
>>> print(select(User).where(User.addresses.contains(a1)))

The above code is equivalent to executing the following query.

SELECT user_account.id, user_account.name, user_account.fullname
FROM user_account
WHERE user_account.id = :param_1
  • You can check if an object in a 1:N relationship is part of a specific parent item. with_parent() creates a comparison that returns rows referencing the given parent item, equivalent to using the == operator."
>>> from sqlalchemy.orm import with_parent
>>> print(select(Address).where(with_parent(u1, User.addresses)))

The above code is equivalent to executing the following query.

SELECT address.id, address.email_address, address.user_id
FROM address
WHERE :param_1 = address.user_id

# Types of Relationship Loading


In the Loading Relationships section, we introduced the concept that when working with mapped object instances and accessing mapped attributes using relationship(), objects that should be in this collection are loaded, and if the collection is not filled, lazy load occurs.

Lazy loading is one of the most famous ORM patterns and also the most controversial. If dozens of ORM objects in memory each refer to a few unloaded properties, the routine manipulation of objects can implicitly release many problems (N+1 Problem), which can accumulate. Such implicit queries may not work at all when attempting database transformations that are no longer viable or when using alternative concurrency patterns like asynchronous.

What is a N + 1 Problem? It's a problem where you fetch N records with one query, but to get the desired data, you end up performing a secondary query for each of these N records.

Lazy loading is a very popular and useful pattern when it is compatible with the concurrency approach in use and does not cause other problems. For this reason, SQLAlchemy's ORM focuses on features that allow you to permit and optimize these load behaviors.

Above all, the first step to effectively using ORM's lazy loading is to test the Application and check the SQL. If inappropriate loads occur for objects detached from the Session, the use of Types of Relationship Loading should be reviewed.

You can mark objects to be associated with a SELECT statement using the Select.options() method.

for user_obj in session.execute(
    select(User).options(selectinload(User.addresses))
).scalars():
    user_obj.addresses  # access addresses collection already loaded

You can also configure it as a default for relationship() using relationship.lazy.

from sqlalchemy.orm import relationship
class User(Base):
    __tablename__ = 'user_account'

    addresses = relationship("Address", back_populates="user", lazy="selectin")

cf. Two Techniques of Relationship Loading


# Select IN loading Method

The most useful loading option in recent SQLAlchemy versions is selectinload(). This option solves the most common form of the "N+1 Problem" problem, which is an issue with sets of objects referencing related collections. It typically uses a SELECT form that can be sent out for the related table without introducing JOINs or subqueries and only queries for parent objects whose collections are not loaded.

The following example shows the Address objects related to a User object being loaded with selectinload(). During the Session.execute() call, two SELECT statements are generated in the database, with the second fetching the related Address objects.

>>> from sqlalchemy.orm import selectinload
>>> stmt = (
...   select(User).options(selectinload(User.addresses)).order_by(User.id)
... )
>>> for row in session.execute(stmt):
...     print(f"{row.User.name}  ({', '.join(a.email_address for a in row.User.addresses)})")
spongebob  (spongebob@sqlalchemy.org)
sandy  (sandy@sqlalchemy.org, sandy@squirrelpower.org)
patrick  ()
squidward  ()
ehkrabs  ()
pkrabs  (pearl.krabs@gmail.com, pearl@aol.com)

The above code is equivalent to executing the following query.

SELECT user_account.id, user_account.name, user_account.fullname
FROM user_account ORDER BY user_account.id
[...] ()
SELECT address.user_id AS address_user_id, address.id AS address_id,
address.email_address AS address_email_address
FROM address
WHERE address.user_id IN (?, ?, ?, ?, ?, ?)
[...] (1, 2, 3, 4, 5, 6)

# Joined Loading Method

Joined Loading, the oldest in SQLAlchemy, is a type of eager loading, also known as joined eager loading. It is best suited for loading objects in "N:1 relationships", as it performs a SELECT JOIN of the tables specified in relationship(), fetching all table data at once.

For example, where an Address object has a connected user, an INNER JOIN can be used rather than an OUTER JOIN.

>>> from sqlalchemy.orm import joinedload
>>> stmt = (
...   select(Address).options(joinedload(Address.user, innerjoin=True)).order_by(Address.id)
... )
>>> for row in session.execute(stmt):
...     print(f"{row.Address.email_address} {row.Address.user.name}")

spongebob@sqlalchemy.org spongebob
sandy@sqlalchemy.org sandy
sandy@squirrelpower.org sandy
pearl.krabs@gmail.com pkrabs
pearl@aol.com pkrabs

The above code is equivalent to executing the following query.

SELECT address.id, address.email_address, address.user_id, user_account_1.id AS id_1,
user_account_1.name, user_account_1.fullname
FROM address
JOIN user_account AS user_account_1 ON user_account_1.id = address.user_id
ORDER BY address.id
[...] ()

joinedload() is also used for 1: N collections but should be evaluated case-by-case compared to other options like selectinload() due to its nested collections and larger collections.

It's important to note that the WHERE and ORDER BY criteria of the SELECT query do not target the table rendered by joinload(). In the SQL query above, you can see an anonymous alias applied to the user_account table, which cannot directly address. This concept is further explained in the [Zen of joined Eager Loading] section.

The ON clause by joinedload() can be directly influenced using the method described previously in Expanding ON Conditions.

cf.

In general cases, "N+1 problem" is much less prevalent, so many-to-one eager loading is often unnecessary.

When many objects all reference the same related object (e.g., many Address objects referencing the same User), a single SQL for the User object is emitted using ordinary lazy loading.

The lazy loading routine queries the related object by the current primary key without emitting SQL if possible.


# Explicit Join + Eager Load Method

A common use case uses the contains_eager() option, which is very similar to joinedload() except it assumes you have set up the JOIN directly and instead marks additional columns in the COLUMNS clause that should be loaded into each object's related properties.

>>> from sqlalchemy.orm import contains_eager

>>> stmt = (
...   select(Address).
...   join(Address.user).
...   where(User.name == 'pkrabs').
...   options(contains_eager(Address.user)).order_by(Address.id)
... )

>>> for row in session.execute(stmt):
...     print(f"{row.Address.email_address} {row.Address.user.name}")

pearl.krabs@gmail.com pkrabs
pearl@aol.com pkrabs

The above code is equivalent to executing the following query.

SELECT user_account.id, user_account.name, user_account.fullname,
address.id AS id_1, address.email_address, address.user_id
FROM address JOIN user_account ON user_account.id = address.user_id
WHERE user_account.name = ? ORDER BY address.id
[...] ('pkrabs',)

For instance, we filtered user_account.name and loaded it into the returned Address.user property. A separate application of joinedload() would have unnecessarily created a twice-joined SQL query.

>>> stmt = (
...   select(Address).
...   join(Address.user).
...   where(User.name == 'pkrabs').
...   options(joinedload(Address.user)).order_by(Address.id)
... )
>>> print(stmt)  # SELECT has a JOIN and LEFT OUTER JOIN unnecessarily

The above code is equivalent to executing the following query.

SELECT address.id, address.email_address, address.user_id,
user_account_1.id AS id_1, user_account_1.name, user_account_1.fullname
FROM address JOIN user_account ON user_account.id = address.user_id
LEFT OUTER JOIN user_account AS user_account_1 ON user_account_1.id = address.user_id
WHERE user_account.name = :name_1 ORDER BY address.id

cf.

Two Techniques of Relationship Loading
Zen of joined Eager Loading


# Setting Loader Paths

The PropComparator.and_() method is actually generally usable for most loader options.

For example, if you want to reload usernames and email addresses from the sqlalchemy.org domain, you can limit the conditions with PropComparator.and_() applied to the arguments passed to selectinload().

>>> from sqlalchemy.orm import selectinload
>>> stmt = (
...   select(User).
...   options(
...       selectinload(
...           User.addresses.and_(
...             ~Address.email_address.endswith("sqlalchemy.org")
...           )
...       )
...   ).
...   order_by(User.id).
...   execution_options(populate_existing=True)
... )

>>> for row in session.execute(stmt):
...     print(f"{row.User.name}  ({', '.join(a.email_address for a in row.User.addresses)})")

spongebob  ()
sandy  (sandy@squirrelpower.org)
patrick  ()
squidward  ()
ehkrabs  ()
pkrabs  (pearl.krabs@gmail.com, pearl@aol.com)

The above code is equivalent to executing the following query.

SELECT user_account.id, user_account.name, user_account.fullname
FROM user_account ORDER BY user_account.id
[...] ()
SELECT address.user_id AS address_user_id, address.id AS address_id,
address.email_address AS address_email_address
FROM address
WHERE address.user_id IN (?, ?, ?, ?, ?, ?)
AND (address.email_address NOT LIKE '%' || ?)
[...] (1, 2, 3, 4, 5, 6, 'sqlalchemy.org')

It's crucial to note the addition of the .execution_options(populate_existing=True) option above. When fetching rows, this option indicates that loader options must replace the existing collections' contents in already loaded objects.

Since we are iterating with a Session object, the objects being loaded here are the same Python instances as those initially maintained at the start of this tutorial's ORM section.


# Raise Loading Method

The raiseload() option is commonly used to completely block the occurrence of the "N+1 problem" by instead causing errors rather than slow loading.

There are two variants: blocking all "load" operations that include works that need SQL (lazy load) and those that only reference the current Session (raiseload.sql_only option).

class User(Base):
    __tablename__ = 'user_account'

    # ... Column mappings

    addresses = relationship("Address", back_populates="user", lazy="raise_on_sql")


class Address(Base):
    __tablename__ = 'address'

    # ... Column mappings

    user = relationship("User", back_populates="addresses", lazy="raise_on_sql")

Using such mappings blocks the application from 'lazy loading', requiring you to specify loader strategies for specific queries.

u1 = s.execute(select(User)).scalars().first()
u1.addresses
sqlalchemy.exc.InvalidRequestError: 'User.addresses' is not available due to lazy='raise_on_sql'

The exception indicates that the collection must be loaded first.

u1 = s.execute(select(User).options(selectinload(User.addresses))).scalars().first()

The lazy="raise_on_sql" option is also wisely attempted for N:1 relationships.

Above, although the Address.user property was not loaded into Address, "raiseload" does not cause an error because the corresponding User object is in the same Session.

cf.

Preventing unwanted lazy loading with raiseload
Preventing lazy loading in relationship