# Working with Related Objects Using ORM
In this chapter, we will cover another essential ORM concept, which is the interaction with mapped objects that reference other objects.
relationship()
defines the relationship between two mapped objects and is also known as self-referencing.
For simplicity, we will omit Column
mappings and other directives, and explain relationship()
in a shortened form.
from sqlalchemy.orm import relationship
class User(Base):
__tablename__ = 'user_account'
# ... Column mappings
addresses = relationship("Address", back_populates="user")
class Address(Base):
__tablename__ = 'address'
# ... Column mappings
user = relationship("User", back_populates="addresses")
In the structure shown, the User
object has a variable addresses
, and the Address
object has a variable user
.
Both are created as relationship objects, but these aren't actual database columns but are set up to allow easy access in the code.
In other words, it facilitates easy navigation from a User
object to an Address
object.
Additionally, the back_populates
parameter in the relationship
declaration allows for the reverse situation, i.e., navigating from an Address
object to a User
object.
In relational Database terms, it naturally sets a 1 : N relationship as an N : 1 relationship.
In the next section, we will see what role the relationship()
object's instances play and how they function.
# Using Related Objects
When a new User
object is created, the .addresses
collection appears as a List
object.
>>> u1 = User(name='pkrabs', fullname='Pearl Krabs')
>>> u1.addresses
[]
You can add an Address
object using list.append()
.
>>> a1 = Address(email_address="pear1.krabs@gmail.com")
>>> u1.addresses.append(a1)
# The u1.addresses collection now includes the new Address object.
>>> u1.addresses
[Address(id=None, email_address='pearl.krabs@gmail.com')]
If an Address
object is associated with the User.addresses
collection, another action occurs in the variable u1
. The User.addresses and Address.user relationship is synchronized, allowing you to move:
- From a User
object to an Address
, and
- Back from an Address
object to a User
.
>>> a1.user
User(id=None, name='pkrabs', fullname='Pearl Krabs')
This is the result of synchronization using relationship.back_populates
between the two relationship()
objects.
The relationship()
parameter can be complementarily assigned/list modified to another variable. Creating another Address
object and assigning it to the Address.user
property makes it part of the User.addresses
collection.
>>> a2 = Address(email_address="pearl@aol.com", user=u1)
>>> u1.addresses
[Address(id=None, email_address='pearl.krabs@gmail.com'), Address(id=None, email_address='pearl@aol.com')]
We actually used the variable u1
as a keyword argument for user
as if it were a property declared in the object (Address
). It's equivalent to assigning the property afterward.
# equivalent effect as a2 = Address(user=u1)
>>> a2.user = u1
# Cascading Objects in the Session
We now have two related User
and Address
objects in a bidirectional structure in memory, but as mentioned earlier in Inserting Rows with ORM , these objects are in a transient state in the Session
until they are associated with it.
We need to see when using Session.add()
, and applying the method to the User
object, that the related Address
objects are also added.
>>> session.add(u1)
>>> u1 in session
True
>>> a1 in session
True
>>> a2 in session
True
The three objects are now in a pending state, which means no INSERT
operations have been executed yet. The three objects have not been assigned primary keys, and the a1
and a2
objects have a column (user_id
) reference property. This is because the objects are not yet actually connected to a real database.
>>> print(u1.id)
None
>>> print(a1.user_id)
None
Let's save it to the database.
>>> session.commit()
If we translate the implemented code into SQL queries, it would look like this.
INSERT INTO user_account (name, fullname) VALUES (?, ?)
[...] ('pkrabs', 'Pearl Krabs')
INSERT INTO address (email_address, user_id) VALUES (?, ?)
[...] ('pearl.krabs@gmail.com', 6)
INSERT INTO address (email_address, user_id) VALUES (?, ?)
[...] ('pearl@aol.com', 6)
COMMIT
Using session, you can automate SQL INSERT
, UPDATE
, DELETE
statements.
Finally, executing Session.commit()
ensures all steps are called in the correct order, and the primary key of address.user_id
is applied in the user_account
.
# Loading Relationships
After calling Session.commit()
, you can see the primary key created for the u1
object.
>>> u1.id
6
The above code is equivalent to executing the following query.
BEGIN (implicit)
SELECT user_account.id AS user_account_id, user_account.name AS user_account_name,
user_account.fullname AS user_account_fullname
FROM user_account
WHERE user_account.id = ?
[...] (6,)
You can also see that id
s are now present in the objects linked to u1.addresses
.
To retrieve these objects, we can observe the lazy load approach.
lazy loading : This is a method where a SELECT statement is executed to fetch information only when someone tries to access that information. In other words, it retrieves the necessary information as needed.
>>> u1.addresses
[Address(id=4, email_address='pearl.krabs@gmail.com'), Address(id=5, email_address='pearl@aol.com')]
SELECT address.id AS address_id, address.email_address AS address_email_address,
address.user_id AS address_user_id
FROM address
WHERE ? = address.user_id
[...] (6,)
SQLAlchemy ORM’s default for collections and related properties is lazy loading. This means once a collection has been relationshipped, as long as the data exists in memory, it remains accessible.
>>> u1.addresses
[Address(id=4, email_address='pearl.krabs@gmail.com'), Address(id=5, email_address='pearl@aol.com')]
Although lazy loading can be costly without explicit steps for optimization, it is optimized at least not to perform redundant operations.
You can also see the a1
and a2
objects in the u1.addresses
collection.
>>> a1
Address(id=4, email_address='pearl.krabs@gmail.com')
>>> a2
Address(id=5, email_address='pearl@aol.com')
We will provide a further introduction to the concept of relationship
in the latter part of this section.
# Using relationship
in Queries
This section introduces several ways in which relationship()
helps automate SQL query construction.
# JOIN tables using relationship()
In Specifying the FROM and JOIN Clauses and WHERE Clauses sections, we used Select.join()
and Select.join_from()
methods to construct SQL JOINs. These methods infer the ON clause based on whether there's a ForeignKeyConstraint
object linking the two tables or provide specific SQL Expression syntax representing the ON
clause.
relationship()
objects can be used to set the ON
clause for joins.
A relationship()
corresponding object can be passed as a single argument to Select.join()
, serving as both the right join and the ON clause.
>>> print(
... select(Address.email_address).
... select_from(User).
... join(User.addresses)
... )
The above code is equivalent to executing the following query.
SELECT address.email_address
FROM user_account JOIN address ON user_account.id = address.user_id
If relationship()
is not specified in Select.join()
or Select.join_from()
, no ON clause is used. This means it functions due to the ForeignKeyConstraint
between the two mapped table objects, not because of the relationship()
object of User
and Address
.
>>> print(
... select(Address.email_address).
... join_from(User, Address)
... )
The above code is equivalent to executing the following query.
SELECT address.email_address
FROM user_account JOIN address ON user_account.id = address.user_id
# Joining Using Aliases(aliased
)
When configuring SQL JOINs using relationship()
, it's suitable to use [PropComparator.of_type()] with aliased()
cases. However, relationship()
is used to configure the same joins as described in [ORM Entity Aliases
].
You can directly use aliased()
in a join with relationship()
.
>>> from sqlalchemy.orm import aliased
>>> address_alias_1 = aliased(Address)
>>> address_alias_2 = aliased(Address)
>>> print(
... select(User).
... join_from(User, address_alias_1).
... where(address_alias_1.email_address == 'patrick@aol.com').
... join_from(User, address_alias_2).
... where(address_alias_2.email_address == 'patrick@gmail.com')
... )
The above code is equivalent to executing the following query.
SELECT user_account.id, user_account.name, user_account.fullname
FROM user_account
JOIN address AS address_1 ON user_account.id = address_1.user_id
JOIN address AS address_2 ON user_account.id = address_2.user_id
WHERE address_1.email_address = :email_address_1
AND address_2.email_address = :email_address_2
You can use join clause in aliased()
object using relationship()
.
>>> user_alias_1 = aliased(User)
>>> print(
... select(user_alias_1.name).
... join(user_alias_1.addresses)
... )
The above code is equivalent to executing the following query.
SELECT user_account_1.name
FROM user_account AS user_account_1
JOIN address ON user_account_1.id = address.user_id
# Expanding ON Conditions
You can add conditions to the ON clause created by relation()
. This feature is useful not only for quickly limiting the scope of a specific join for a related path but also for use cases like loader strategy configuration introduced in the last section.
PropComparator.and_()
method allows a series of SQL expressions to be positionally combined in the JOIN's ON
clause via AND
.
For example, to limit the ON criteria to specific email addresses using User
and Address
, you would do this.
>>> stmt = (
... select(User.fullname).
... join(User.addresses.and_(Address.email_address == 'pearl.krabs@gmail.com'))
... )
>>> session.execute(stmt).all()
[('Pearl Krabs',)]
The above code is equivalent to executing the following query.
SELECT user_account.fullname
FROM user_account
JOIN address ON user_account.id = address.user_id AND address.email_address = ?
[...] ('pearl.krabs@gmail.com',)
# EXISTS has()
, and()
In the EXISTS Subqueries section, the SQL EXISTS keyword was introduced along with the Scalar Subqueries, Correlated Queries section.
relationship()
provides some help in commonly creating subqueries for relationships.
For a 1:N (one-to-many) relationship like User.addresses
, you can use PropComparator.any()
to create a subquery for the address table rejoining the user_account
table. This method allows optional WHERE criteria to limit the rows matching the subquery.
>>> stmt = (
... select(User.fullname).
... where(User.addresses.any(Address.email_address == 'pearl.krabs@gmail.com'))
... )
>>> session.execute(stmt).all()
[('Pearl Krabs',)]
The above code is equivalent to executing the following query.
SELECT user_account.fullname
FROM user_account
WHERE EXISTS (SELECT 1
FROM address
WHERE user_account.id = address.user_id AND address.email_address = ?)
[...] ('pearl.krabs@gmail.com',)
Conversely, to find objects without related data, use ~User.addresses.any()
to search for User
objects.
>>> stmt = (
... select(User.fullname).
... where(~User.addresses.any())
... )
>>> session.execute(stmt).all()
[('Patrick McStar',), ('Squidward Tentacles',), ('Eugene H. Krabs',)]
The above code is equivalent to executing the following query.
SELECT user_account.fullname
FROM user_account
WHERE NOT (EXISTS (SELECT 1
FROM address
WHERE user_account.id = address.user_id))
[...] ()
PropComparator.has()
works similarly to PropComparator.any()
but is used for N:1 (Many-to-one) relationships.
For instance, to find all Address
objects belonging to "pearl", you would use this method.
>>> stmt = (
... select(Address.email_address).
... where(Address.user.has(User.name=="pkrabs"))
... )
>>> session.execute(stmt).all()
[('pearl.krabs@gmail.com',), ('pearl@aol.com',)]
The above code is equivalent to executing the following query.
SELECT address.email_address
FROM address
WHERE EXISTS (SELECT 1
FROM user_account
WHERE user_account.id = address.user_id AND user_account.name = ?)
[...] ('pkrabs',)
# Relationship Operators
Several types of SQL creation helpers come with relationship()
:
- N : 1 (Many-to-one) comparison You can select rows where the foreign key of the target entity matches the primary key value of a specified object instance in an N:1 relationship.
>>> print(select(Address).where(Address.user == u1))
The above code is equivalent to executing the following query.
SELECT address.id, address.email_address, address.user_id
FROM address
WHERE :param_1 = address.user_id
- NOT N : 1 (Many-to-one) comparison You can use the not equal (!=) operator.
>>> print(select(Address).where(Address.user != u1))
The above code is equivalent to executing the following query.
SELECT address.id, address.email_address, address.user_id
FROM address
WHERE address.user_id != :user_id_1 OR address.user_id IS NULL
- You can check if an object is included in a 1:N (one-to-many) collection.
>>> print(select(User).where(User.addresses.contains(a1)))
The above code is equivalent to executing the following query.
SELECT user_account.id, user_account.name, user_account.fullname
FROM user_account
WHERE user_account.id = :param_1
- You can check if an object in a 1:N relationship is part of a specific parent item.
with_parent()
creates a comparison that returns rows referencing the given parent item, equivalent to using the==
operator."
>>> from sqlalchemy.orm import with_parent
>>> print(select(Address).where(with_parent(u1, User.addresses)))
The above code is equivalent to executing the following query.
SELECT address.id, address.email_address, address.user_id
FROM address
WHERE :param_1 = address.user_id
# Types of Relationship Loading
In the Loading Relationships section, we introduced the concept that when working with mapped object instances and accessing mapped attributes using relationship()
, objects that should be in this collection are loaded, and if the collection is not filled, lazy load occurs.
Lazy loading is one of the most famous ORM patterns and also the most controversial. If dozens of ORM objects in memory each refer to a few unloaded properties, the routine manipulation of objects can implicitly release many problems (N+1 Problem
), which can accumulate. Such implicit queries may not work at all when attempting database transformations that are no longer viable or when using alternative concurrency patterns like asynchronous.
What is a
N + 1 Problem
? It's a problem where you fetch N records with one query, but to get the desired data, you end up performing a secondary query for each of these N records.
Lazy loading is a very popular and useful pattern when it is compatible with the concurrency approach in use and does not cause other problems. For this reason, SQLAlchemy's ORM focuses on features that allow you to permit and optimize these load behaviors.
Above all, the first step to effectively using ORM's lazy loading is to test the Application and check the SQL.
If inappropriate loads occur for objects detached from the Session
, the use of Types of Relationship Loading
should be reviewed.
You can mark objects to be associated with a SELECT statement using the Select.options()
method.
for user_obj in session.execute(
select(User).options(selectinload(User.addresses))
).scalars():
user_obj.addresses # access addresses collection already loaded
You can also configure it as a default for relationship()
using relationship.lazy
.
from sqlalchemy.orm import relationship
class User(Base):
__tablename__ = 'user_account'
addresses = relationship("Address", back_populates="user", lazy="selectin")
cf. Two Techniques of Relationship Loading
Configuring Loader Strategies at Mapping Time
- Details about
relationship()
configurationRelationship Loading with Loader Options
- Details about the loader
# Select IN loading Method
The most useful loading option in recent SQLAlchemy versions is selectinload()
. This option solves the most common form of the "N+1 Problem" problem, which is an issue with sets of objects referencing related collections. It typically uses a SELECT form that can be sent out for the related table without introducing JOINs or subqueries and only queries for parent objects whose collections are not loaded.
The following example shows the Address objects related to a User
object being loaded with selectinload()
. During the Session.execute()
call, two SELECT statements are generated in the database, with the second fetching the related Address
objects.
>>> from sqlalchemy.orm import selectinload
>>> stmt = (
... select(User).options(selectinload(User.addresses)).order_by(User.id)
... )
>>> for row in session.execute(stmt):
... print(f"{row.User.name} ({', '.join(a.email_address for a in row.User.addresses)})")
spongebob (spongebob@sqlalchemy.org)
sandy (sandy@sqlalchemy.org, sandy@squirrelpower.org)
patrick ()
squidward ()
ehkrabs ()
pkrabs (pearl.krabs@gmail.com, pearl@aol.com)
The above code is equivalent to executing the following query.
SELECT user_account.id, user_account.name, user_account.fullname
FROM user_account ORDER BY user_account.id
[...] ()
SELECT address.user_id AS address_user_id, address.id AS address_id,
address.email_address AS address_email_address
FROM address
WHERE address.user_id IN (?, ?, ?, ?, ?, ?)
[...] (1, 2, 3, 4, 5, 6)
# Joined Loading Method
Joined Loading, the oldest in SQLAlchemy, is a type of eager loading, also known as joined eager loading
. It is best suited for loading objects in "N:1 relationships", as it performs a SELECT JOIN of the tables specified in relationship()
, fetching all table data at once.
For example, where an Address
object has a connected user, an INNER JOIN can be used rather than an OUTER JOIN.
>>> from sqlalchemy.orm import joinedload
>>> stmt = (
... select(Address).options(joinedload(Address.user, innerjoin=True)).order_by(Address.id)
... )
>>> for row in session.execute(stmt):
... print(f"{row.Address.email_address} {row.Address.user.name}")
spongebob@sqlalchemy.org spongebob
sandy@sqlalchemy.org sandy
sandy@squirrelpower.org sandy
pearl.krabs@gmail.com pkrabs
pearl@aol.com pkrabs
The above code is equivalent to executing the following query.
SELECT address.id, address.email_address, address.user_id, user_account_1.id AS id_1,
user_account_1.name, user_account_1.fullname
FROM address
JOIN user_account AS user_account_1 ON user_account_1.id = address.user_id
ORDER BY address.id
[...] ()
joinedload()
is also used for 1: N collections but should be evaluated case-by-case compared to other options like selectinload()
due to its nested collections and larger collections.
It's important to note that the WHERE and ORDER BY criteria of the SELECT query do not target the table rendered by joinload()
. In the SQL query above, you can see an anonymous alias applied to the user_account
table, which cannot directly address. This concept is further explained in the [Zen of joined Eager Loading] section.
The ON clause by joinedload()
can be directly influenced using the method described previously in Expanding ON Conditions
.
cf.
In general cases, "N+1 problem" is much less prevalent, so many-to-one eager loading is often unnecessary.
When many objects all reference the same related object (e.g., many
Address
objects referencing the sameUser
), a single SQL for theUser
object is emitted using ordinary lazy loading.The lazy loading routine queries the related object by the current primary key without emitting SQL if possible.
# Explicit Join + Eager Load Method
A common use case uses the contains_eager()
option, which is very similar to joinedload()
except it assumes you have set up the JOIN directly and instead marks additional columns in the COLUMNS clause that should be loaded into each object's related properties.
>>> from sqlalchemy.orm import contains_eager
>>> stmt = (
... select(Address).
... join(Address.user).
... where(User.name == 'pkrabs').
... options(contains_eager(Address.user)).order_by(Address.id)
... )
>>> for row in session.execute(stmt):
... print(f"{row.Address.email_address} {row.Address.user.name}")
pearl.krabs@gmail.com pkrabs
pearl@aol.com pkrabs
The above code is equivalent to executing the following query.
SELECT user_account.id, user_account.name, user_account.fullname,
address.id AS id_1, address.email_address, address.user_id
FROM address JOIN user_account ON user_account.id = address.user_id
WHERE user_account.name = ? ORDER BY address.id
[...] ('pkrabs',)
For instance, we filtered user_account.name
and loaded it into the returned Address.user
property. A separate application of joinedload()
would have unnecessarily created a twice-joined SQL query.
>>> stmt = (
... select(Address).
... join(Address.user).
... where(User.name == 'pkrabs').
... options(joinedload(Address.user)).order_by(Address.id)
... )
>>> print(stmt) # SELECT has a JOIN and LEFT OUTER JOIN unnecessarily
The above code is equivalent to executing the following query.
SELECT address.id, address.email_address, address.user_id,
user_account_1.id AS id_1, user_account_1.name, user_account_1.fullname
FROM address JOIN user_account ON user_account.id = address.user_id
LEFT OUTER JOIN user_account AS user_account_1 ON user_account_1.id = address.user_id
WHERE user_account.name = :name_1 ORDER BY address.id
cf.
Two Techniques of Relationship Loading
Zen of joined Eager Loading
- Details about this loading method
Routing Explicit Joins/Statements into Eagerly Loaded Collections
- using
contains_eager()
# Setting Loader Paths
The PropComparator.and_()
method is actually generally usable for most loader options.
For example, if you want to reload usernames and email addresses from the sqlalchemy.org
domain, you can limit the conditions with PropComparator.and_()
applied to the arguments passed to selectinload()
.
>>> from sqlalchemy.orm import selectinload
>>> stmt = (
... select(User).
... options(
... selectinload(
... User.addresses.and_(
... ~Address.email_address.endswith("sqlalchemy.org")
... )
... )
... ).
... order_by(User.id).
... execution_options(populate_existing=True)
... )
>>> for row in session.execute(stmt):
... print(f"{row.User.name} ({', '.join(a.email_address for a in row.User.addresses)})")
spongebob ()
sandy (sandy@squirrelpower.org)
patrick ()
squidward ()
ehkrabs ()
pkrabs (pearl.krabs@gmail.com, pearl@aol.com)
The above code is equivalent to executing the following query.
SELECT user_account.id, user_account.name, user_account.fullname
FROM user_account ORDER BY user_account.id
[...] ()
SELECT address.user_id AS address_user_id, address.id AS address_id,
address.email_address AS address_email_address
FROM address
WHERE address.user_id IN (?, ?, ?, ?, ?, ?)
AND (address.email_address NOT LIKE '%' || ?)
[...] (1, 2, 3, 4, 5, 6, 'sqlalchemy.org')
It's crucial to note the addition of the .execution_options(populate_existing=True)
option above. When fetching rows, this option indicates that loader options must replace the existing collections' contents in already loaded objects.
Since we are iterating with a Session
object, the objects being loaded here are the same Python instances as those initially maintained at the start of this tutorial's ORM section.
# Raise Loading Method
The raiseload()
option is commonly used to completely block the occurrence of the "N+1 problem" by instead causing errors rather than slow loading.
There are two variants: blocking all "load" operations that include works that need SQL (lazy load) and those that only reference the current Session
(raiseload.sql_only
option).
class User(Base):
__tablename__ = 'user_account'
# ... Column mappings
addresses = relationship("Address", back_populates="user", lazy="raise_on_sql")
class Address(Base):
__tablename__ = 'address'
# ... Column mappings
user = relationship("User", back_populates="addresses", lazy="raise_on_sql")
Using such mappings blocks the application from 'lazy loading', requiring you to specify loader strategies for specific queries.
u1 = s.execute(select(User)).scalars().first()
u1.addresses
sqlalchemy.exc.InvalidRequestError: 'User.addresses' is not available due to lazy='raise_on_sql'
The exception indicates that the collection must be loaded first.
u1 = s.execute(select(User).options(selectinload(User.addresses))).scalars().first()
The lazy="raise_on_sql"
option is also wisely attempted for N:1 relationships.
Above, although the Address.user
property was not loaded into Address
, "raiseload" does not cause an error because the corresponding User
object is in the same Session
.
cf.
Preventing unwanted lazy loading with
raiseload
Preventing lazy loading inrelationship