Setup:
# Pydantic Models
class TMDB_Category(BaseModel):
name: str = Field(alias="strCategory")
description: str = Field(alias="strCategoryDescription")
class TMDB_GetCategoriesResponse(BaseModel):
categories: list[TMDB_Category]
@router.get(path="category", response_model=TMDB_GetCategoriesResponse)
async def get_all_categories():
async with httpx.AsyncClient() as client:
response = await client.get(Endpoint.GET_CATEGORIES)
return TMDB_GetCategoriesResponse.parse_obj(response.json())
Problem:
Alias is being used when creating a response, and I want to avoid it. I only need this alias to correctly map the incoming data but when returning a response, I want to use actual field names.
Actual response:
{
"categories": [
{
"strCategory": "Beef",
"strCategoryDescription": "Beef is ..."
},
{
"strCategory": "Chicken",
"strCategoryDescription": "Chicken is ..."
}
}
Expected response:
{
"categories": [
{
"name": "Beef",
"description": "Beef is ..."
},
{
"name": "Chicken",
"description": "Chicken is ..."
}
}
Switch aliases and field names and use the allow_population_by_field_name
model config option:
class TMDB_Category(BaseModel):
strCategory: str = Field(alias="name")
strCategoryDescription: str = Field(alias="description")
class Config:
allow_population_by_field_name = True
Let the aliases configure the names of the fields that you want to return, but enable allow_population_by_field_name
to be able to parse data that uses different names for the fields.
An alternate option (which likely won't be as popular ) is to use a de-serialization library other than pydantic
. For example, the Dataclass Wizard library is one which supports this particular use case. If you need the same round-trip behavior that Field(alias=...)
provides, you can pass the all
param to the json_field
function. Note that with such a library, you do lose out on the ability to perform complete type validation, which is arguably one of pydantic's greatest strengths; however it does, perform type conversion in a similar fashion to pydantic. There are also a few reasons why I feel that validation is not as important, which I do list below.
Reasons why I would argue that data validation is a nice to have feature in general:
So to demonstrate that, here's a simple example for the above use case using the dataclass-wizard library (which relies on the usage of dataclasses
instead of pydantic models):
from dataclasses import dataclass
from dataclass_wizard import JSONWizard, json_field
@dataclass
class TMDB_Category:
name: str = json_field('strCategory')
description: str = json_field('strCategoryDescription')
@dataclass
class TMDB_GetCategoriesResponse(JSONWizard):
categories: list[TMDB_Category]
And the code to run that, would look like this:
input_dict = {
"categories": [
{
"strCategory": "Beef",
"strCategoryDescription": "Beef is ..."
},
{
"strCategory": "Chicken",
"strCategoryDescription": "Chicken is ..."
}
]
}
c = TMDB_GetCategoriesResponse.from_dict(input_dict)
print(repr(c))
# TMDB_GetCategoriesResponse(categories=[TMDB_Category(name='Beef', description='Beef is ...'), TMDB_Category(name='Chicken', description='Chicken is ...')])
print(c.to_dict())
# {'categories': [{'name': 'Beef', 'description': 'Beef is ...'}, {'name': 'Chicken', 'description': 'Chicken is ...'}]}
If anyone is curious, I've set up a quick benchmark test to compare deserialization and serialization times with pydantic vs. just dataclasses:
from dataclasses import dataclass
from timeit import timeit
from pydantic import BaseModel, Field
from dataclass_wizard import JSONWizard, json_field
# Pydantic Models
class Pydantic_TMDB_Category(BaseModel):
name: str = Field(alias="strCategory")
description: str = Field(alias="strCategoryDescription")
class Pydantic_TMDB_GetCategoriesResponse(BaseModel):
categories: list[Pydantic_TMDB_Category]
# Dataclasses
@dataclass
class TMDB_Category:
name: str = json_field('strCategory', all=True)
description: str = json_field('strCategoryDescription', all=True)
@dataclass
class TMDB_GetCategoriesResponse(JSONWizard):
categories: list[TMDB_Category]
# Input dict which contains sufficient data for testing (100 categories)
input_dict = {
"categories": [
{
"strCategory": f"Beef {i * 2}",
"strCategoryDescription": "Beef is ..." * i
}
for i in range(100)
]
}
n = 10_000
print('=== LOAD (deserialize)')
print('dataclass-wizard: ',
timeit('c = TMDB_GetCategoriesResponse.from_dict(input_dict)',
globals=globals(), number=n))
print('pydantic: ',
timeit('c = Pydantic_TMDB_GetCategoriesResponse.parse_obj(input_dict)',
globals=globals(), number=n))
c = TMDB_GetCategoriesResponse.from_dict(input_dict)
pydantic_c = Pydantic_TMDB_GetCategoriesResponse.parse_obj(input_dict)
print('=== DUMP (serialize)')
print('dataclass-wizard: ',
timeit('c.to_dict()',
globals=globals(), number=n))
print('pydantic: ',
timeit('pydantic_c.dict()',
globals=globals(), number=n))
And the benchmark results (tested on Mac OS Big Sur, Python 3.9.0):
=== LOAD (deserialize)
dataclass-wizard: 1.742989194
pydantic: 5.31538175
=== DUMP (serialize)
dataclass-wizard: 2.300118940
pydantic: 5.582638598
In their docs, pydantic
claims to be the fastest library in general, but it's rather straightforward to prove otherwise. As you can see, for the above dataset pydantic
is about 2x slower in both the deserialization and serialization process. It's worth noting that pydantic
is already quite fast, though.
maybe you could use this approach
from pydantic import BaseModel, Field
class TMDB_Category(BaseModel):
name: str = Field(alias="strCategory")
description: str = Field(alias="strCategoryDescription")
data = {
"strCategory": "Beef",
"strCategoryDescription": "Beef is ..."
}
obj = TMDB_Category.parse_obj(data)
# {'name': 'Beef', 'description': 'Beef is ...'}
print(obj.dict())
I was trying to do something similar (migrate a field pattern
to a list of patterns
while gracefully handling old versions of the data). The best solution I could find was to do the field mapping in the __init__
method. In the terms of OP, this would be like:
class TMDB_Category(BaseModel):
name: str
description: str
def __init__(self, **data):
if "strCategory" in data:
data["name"] = data.pop("strCategory")
if "strCategoryDescription" in data:
data["description"] = data.pop("strCategoryDescription")
super().__init__(**data)
Then we have:
>>> TMDB_Category(strCategory="name", strCategoryDescription="description").json()
'{"name": "name", "description": "description"}'
If you need to use field aliases to do this but still use the name/description fields in your code, one option is to alter Hernán Alarcón's solution to use properties:
class TMDB_Category(BaseModel):
strCategory: str = Field(alias="name")
strCategoryDescription: str = Field(alias="description")
class Config:
allow_population_by_field_name = True
@property
def name(self):
return self.strCategory
@name.setter
def name(self, value):
self.strCategory = value
@property
def description(self):
return self.strCategoryDescription
@description.setter
def description(self, value):
self.strCategoryDescription = value
That's still a bit awkward, since the repr uses the "alias" names:
>>> TMDB_Category(name="name", description="description")
TMDB_Category(strCategory='name', strCategoryDescription='description')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.