Model Comparison and Diffing

Model Comparison isn't something you'll use in every project, but when you do need it, you'll love the built-in integration.

Model Comparison refers to examining differing data from a DAOModel. Said data could be multiple entries or different versions of a single entry.

There are many use cases for comparing data:

  • Highlight differences between model entries
  • Track changes to an object over time
  • Resolve conflicts when multiple users modify the same data
  • Merge several data sources into one

This multifunctional tool consists of several components. But everything stems from the ModelDiff class, so let's start there!

ModelDiff

Imagine we are shopping for a new vehicle and wish to compare options:

class EV(DAOModel, table=True):
    make: Identifier[str]
    model: Identifier[str]
    drivetrain: str
    acceleration: float
    battery_capacity: int
    range: int
    charging_speed: int
    screen_size: float
    price: int

model_3 = ev_dao.create_with(make='Tesla', model='Model 3',
    drivetrain='AWD',
    acceleration=4.2,
    battery_capacity=82,
    range=333,
    charging_speed=250,
    screen_size=15,
    price=42990
)

mach_e = ev_dao.create_with(make='Ford', model='Mustang Mach-E',
    drivetrain='AWD',
    acceleration=4.8,
    battery_capacity=91,
    range=310,
    charging_speed=150,
    screen_size=15.5,
    price=42995
)

bolt = ev_dao.create_with(make='Chevrolet', model='Bolt EUV',
    drivetrain='FWD',
    acceleration=7.0,
    battery_capacity=65,
    range=247,
    charging_speed=55,
    screen_size=10.2,
    price=28995
)

r1t = ev_dao.create_with(make='Rivian', model='R1T',
    drivetrain='AWD',
    acceleration=3.0,
    battery_capacity=135,
    range=328,
    charging_speed=200,
    screen_size=16,
    price=73900
)

Since we have a model, and some entries, a ModelDiff can be used to make a comparison between the entries. ModelDiff is a fairly simple class to use. First, we construct a new one using two DAOModel instances.

diff = ModelDiff(model_3, mach_e)

Tip

This feature is designed and tested for comparing like models but, in theory, could work for unlike models. For example, if the models are fairly similar, we could compare an EV, to a Hybrid, or ICE Vehicle.

Now that we have a diff object, we can get the values for each item:

if 'price' in diff:  # returns true if the entries had different prices
    diff.get_left('price')  # returns the price of the Model 3
    diff.get_right('price')  # returns the price of the Mach E
    diff.all_values('price')  # returns an array of each value i.e. [42990, 42995]

Info

Under the hood, a ModelDiff is simply a dict with additional functions. Which means it is naturally compatible with other services such as serialization.

Viewing values isn't anything revolutionary. The main purpose of a ModelDiff is to compare the values, but to do that, we must understand Preferences.

Preference

Preference Enum

A Preference is an Enum with the following potential values:

  • LEFT: Indicates preference for the left side value
  • RIGHT: Indicates preference for the right side value
  • NEITHER: Indicates that neither value is preferred
  • BOTH: Indicates that both values are equally preferable
  • NOT_APPLICABLE: Indicates that a preference does not apply in the given context

Calling get_preferred on a ModelDiff will return the Preference.

get_preferred

get_preferred(field: str) -> Preference

Determines which of the differing values is preferred.

Parameters:
  • field (str) –

    The name of the field

Returns:
  • Preference

    The Preference between the possible values

Raises:
  • KeyError

    if the field is invalid or otherwise not included in this diff

  • NotImplementedError

    if there is no applicable preference rule provided for the field

preferred_acceleration = diff.get_preffered('acceleration')
preferred_range = diff.get_preffered('range')

Testing the code above will result in a NotImplementedError because ModelDiff does not automatically know your preferences. To progress, preferences must be defined through Preference Rules.

Preference Rules

Rules are passed in through the ModelDiff constructor as keyword arguments. Each preference rule is specific to a field.

diff = ModelDiff(model_3, mach_e, range=max, price=min)

Python allows for passing an entire dict as **kwargs. This is especially convenient if we have many rules.

ev_rules = {
    # Assign a field to always be a specific Preference
    'make': Preference.NOT_APPLICABLE,
    'model': Preference.NOT_APPLICABLE,
    'battery_capacity': Preference.NOT_APPLICABLE,

    # Use an existing function to return the preferred value
    'range': max,
    'charging_speed': max,
    'screen_size': max,

    # Both min and max refer to the builtin functions
    'acceleration': min,
    'price': min,

    # Write your own rule function
    'drivetrain': custom_drivetrain_rule
}

diff = ModelDiff(model_3, mach_e, **ev_rules)

A PreferenceRule is one of two things: - A hardcoded Preference (LEFT, RIGHT, etc) - A function that calculates the Preference

Tip

A PreferenceRule could technically be a static value (such as True or 'specific value') but that feature is not fully developed. Let me know if you want that work completed!

Custom Rule Function

Let's take a look at custom_drivetrain_rule used above:

# The ModelDiff provides the differing values to the custom rule as an argument
def custom_drivetrain_rule(*values: str) -> Preference | str:
    bad = {'FWD', 'RWD', '2WD'}  # these values are bad, we would never prefer them
    good = {'AWD', '4WD'}  # these are the values we want to have (we consider them equally good)

    # Find all the values that are good and thus should be preferred
    good_values = [v for v in values if v in good]

    match len(good_values):
        case 0:  # no good values
            return Preference.NEITHER
        case 1:  # only one of the values is good
            return good_values[0]  # return that value
        case _:  # multiple good values
            return Preference.BOTH

Notice that the parameter is a variable number of arguments. Alternatively, named arguments or a single list argument are also supported.

def custom_drivetrain_rule(left: str, right: str):
def custom_drivetrain_rule(values: list[str]):

It all depends on your style, the ModelDiff code will adjust the function call as needed.

The other thing to note is that the function can return either a Preference or one of the string values. Look again at the code, if there are 0 "good values", then a Preference of NEITHER is returned. However, if there is 1 "good value", then that value is returned. Even though the rule returns a string, that string will be translated to the appropriate Preference automatically.

preference = diff.get_preferred('drivetrain')  # returns Preference.Left

Info

This value conversion is what allows you to use existing functions like max and min.

Let's take another look at those rules and see how we can trim them down.

Excluding Rules

Some fields are set to Preference.NOT_APPLICABLE. We won't ever be comparing them, so let's just exclude those rules.

ev_rules = {
-   'make': Preference.NOT_APPLICABLE,
-   'model': Preference.NOT_APPLICABLE,
-   'battery_capacity': Preference.NOT_APPLICABLE,

    'range': max,
    'charging_speed': max,
    'screen_size': max,

    'acceleration': min,
    'price': min,

    'drivetrain': custom_drivetrain_rule
}

Warning

The rules need not be all-inclusive, but do keep in mind that trying to compare without an applicable rule will result in a NotImplementedError

Default Rule

Notice that multiple fields use the max function as a rule?

ev_rules = {
    'range': max,
    'charging_speed': max,
    'screen_size': max,

    'acceleration': min,
    'price': min,

    'drivetrain': custom_drivetrain_rule
}

Bigger isn't always better, but looks like most of the time it is. If we set a default rule of max then we need not explicitly assign some of these rules.

ev_rules = {
-   'range': max,
-   'charging_speed': max,
-   'screen_size': max,
+   'default': max,

    'acceleration': min,
    'price': min,

    'drivetrain': custom_drivetrain_rule
}

Danger

Be careful with default as those NOT_APPLICABLE fields we removed (make, model, batter_capacity) will now use the default rule of max

Tip

Now that your rules are reduced, maybe you can fit it all within the constructor!

diff = ModelDiff(
    model_3, mach_e,
    default=max,
    acceleration=min,
    price=min,
    drivetrain=custom_drivetrain_rule
)

Including Primary Key Values

Often the PK values are NOT APPLICABLE. Such is the case for our EV example. For this reason, the PK is automatically excluded from the diff all together. If we wish to include the PK, we simply set include_pk=True. Let's look at that in action.

Consider our EV data:

Make Model Drivetrain 0–60 mph (s) Battery (kWh) Range (mi) Charging Speed (kW) Screen Size (in) Price ($)
Tesla Model 3 AWD 4.2 82 333 250 15.0 42,990
Ford Mustang Mach-E AWD 4.8 91 310 150 15.5 42,995
Chevrolet Bolt EUV FWD 7.0 65 247 55 10.2 28,995
Rivian R1T AWD 3.0 135 328 200 16.0 73,900

Say we have a preference of Vehicle Make: Tesla > Ford > Everything else. We can define a PreferenceRule for that.

ev_rules = {
    'make': lambda values: (
        'Tesla' if 'Tesla' in values else
        'Ford' if 'Ford' in values else
        Preference.NOT_APPLICABLE
    ),
    ...
}

diff = ModelDiff(model_3, mach_e, **ev_rules)

However, since make is a primary key, it is not included within the diff. This means that get_preferred will raise a KeyError which is remedied with the include_pk flag.

diff = ModelDiff(model_3, mach_e, include_pk=True, **ev_rules)
diff.get_preferred('make')  # returns Preference.LEFT since Tesla is preferred over Ford

Warning

A KeyError could still be thrown if the make values are the same, thus not included within the diff. You can avoid this error by first checking if 'make' in diff:.

Passing the pk_flag and rules each time we wish to make a comparison is too repetitive. Python allows us to avoid duplicate code by overriding the ModelDiff class.

Custom ModelDiff Class

ModelDiff is designed for ease of customization. Let's consider refactoring our code from above.

# Override the ModelDiff class and support type hints by specifying the EV DAOModel
class EVDiff(ModelDiff[EV]):
    def __init__(self, left: EV, right: EV):
        # Inject the flag and rule arguments that will stay constant
        super().__init__(left, right, include_pk=True,
            default=max,

            make=lambda values: (
                'Tesla' if 'Tesla' in values else
                'Ford' if 'Ford' in values else
                Preference.NOT_APPLICABLE
            ),
            model=Preference.NOT_APPLICABLE,

            acceleration=min,
            price=min,
            drivetrain=custom_drivetrain_rule
        )

# We no longer need to pass the rules and pk flag each time
diff1 = EVDiff(model_3, mach_e)
diff2 = EVDiff(mach_e, bolt)
diff3 = EVDiff(model_3, r1t)

See how this extra class definition can clean up the logic elsewhere in your code? More than that, a custom ModelDiff class can allow for functionality not possible otherwise, such as determining a preference based on another field value. This is essential in comparing our EVCharger models:

Model:

class EVCharger(DAOModel, table=True):
    brand: Identifier[str]
    model: Identifier[str]
    connector_type: str
    cord_length: int
    max_amps: int
    voltage: int
    bidirectional: bool
    wifi: bool
    portable: bool

Data:

Brand Model Connector Cord Length Max Amps Voltage Bidirectional WiFi Portable
Tesla Mobile Connector NACS 20 ft 32 240 No No Yes
Tesla Wall Connector NACS 24 ft 48 240 No Yes No
Tesla Universal Wall Connector J1772 24 ft 48 240 Yes Yes No
Ford Connected Charge Station J1772 20 ft 48 240 No Yes No
Ford Charge Station Pro J1772 25 ft 80 240 Yes Yes No
GM Dual Level Charger J1772 22 ft 32 240 No No Yes
GM PowerUp 2 J1772 25 ft 48 240 No Yes No
GM Energy PowerShift J1772 25 ft 80 240 Yes Yes No
Rivian Portable Charger (NACS) NACS 18 ft 32 240 No No Yes
Rivian Portable Charger (J1772) J1772 18 ft 32 240 No No Yes
Rivian Wall Charger (NACS) NACS 24 ft 48 240 No No No
Rivian Wall Charger (J1772) J1772 24 ft 48 240 No No No

Rules:

def prefer_true(left, right):
    if left:
        return Preference.LEFT
    elif right:
        return Preference.RIGHT
    return Preference.NEITHER

rules = {
    'default': max,
    'connector_type': Preference.NOT_APPLICABLE,
    'bidirectional': prefer_true,
    'wifi': prefer_true,
    'portable': prefer_true,
}

class EVChargerDiff(ModelDiff[EVCharger]):
    def __init__(self, left: EVCharger, right: EVCharger):
        super().__init__(left, right, **rules)

When it comes to cord_length, longer is better. But if the charger is portable then we do not care so much about its length. In that case, the preference for cord_length should really be NOT_APPLICABLE. To achieve this, we will override the get_preferred function.

class EVChargerDiff(ModelDiff[EVCharger]):
    def __init__(self, left: EVCharger, right: EVCharger):
        super().__init__(left, right, **rules)

    def get_preferred(self, field: str):
        if field == 'cord_length':
            if self.left.portable or self.right.portable:
                return Preference.NOT_APPLICABLE

        # If not portable, or the field in question is not cord_length, then call the original code.
        return super().get_preferred(field)

See how overriding gives more flexibility than passing in rules? We now have visibility of the full models, not just their value for a single field. That is just one example of how powerful custom ModelDiff classes can be.

DAOModel actually has a couple of these custom ModelDiff classes packaged with it. For example, ChangeSet.

ChangeSet

A ChangeSet is used to view a set of changes on a model entry (similar to a Pull Request for a code change). When working with a ChangeSet, the goal is to apply the target changes to a baseline object. During this process, we want some of the baseline properties to be overwritten, but others should be preserved. Preference Rules dictate which values will be a part of the final result.

Since a ChangeSet is an extension of ModelDiff it has the functionality listed above plus a couple modifications:

  • get_baseline(field) and get_target(field) mimic get_left and get_right respectively, providing more meaningful naming
  • modified_in_baseline and modified_in_target properties list the fields that are not their default value
  • get_preferred(field) now also contains logic that prefers populated values, especially modified values
Baseline Target Preference
No value Default Preference.RIGHT
No value Modified Preference.RIGHT
Default No value Preference.LEFT
Default Modified Preference.RIGHT
Modified No value Preference.LEFT
Modified Default Preference.LEFT
Modified Modified Preference.Both

Note

The table excludes scenarios where both values are None, or both values are the default property value. In each case, those values would match and therefore not be included as a diff

Usage

Below, we have the model for a user-submitted Point-of-Interest app:

class POI(DAOModel, table=True):
    name: Identifier[str]
    category: Optional[str]
    description: Optional[str]
    website: Optional[str]
    latitude: float
    longitude: float
    ada_compliant: Optional[bool]
    latest_condition: Optional[str]

poi_dao.create_with(
    name='Niagara Falls',
    description='A world-famous trio of waterfalls straddling the Canada-U.S. border.',
    latitude=43.0815,
    longitude=-79.0642
)

Users can add new POIs or update existing ones. Since a POI already exists for Niagara Falls, a ChangeSet is used to allow someone to provide new information.

# Retrieve the existing Niagara Falls entry
saved_poi = poi_dao.get('Niagara Falls')

# Create an entry from the user's input
user_entry = poi_dao.create_with(
    name='Niagara Falls',
    category='Nature',
    website='https://www.nps.gov/nifa',
    ada_compliant=True,
    latest_condition='Fireworks tonight!',
    insert=False
)

# The original baseline object is the first arg, the new target data is the last arg
change_set = ChangeSet(saved_poi, user_entry)

ChangeSets typically have a fairly standard workflow:

  1. Create a ChangeSet
  2. Resolve all preferences
  3. Apply the changes

i.e. ChangeSet(baseline, target).resolve_preferences().apply()

Resolving Preferences

The resolve_preferences() function uses get_preferred() to decide which values are worth keeping.

resolve_preferences

resolve_preferences() -> ChangeSet

Removes unwanted changes, preserving the meaningful values, regardless of them being from baseline or target

Returns:
  • ChangeSet

    This ChangeSet to allow for chaining function calls

Raises:
  • Conflict

    if both baseline and target have meaningful values (unless resolve_conflict is overridden)

change_set.resolve_preferences()

Essentially, what this function does is evaluate the diff and eliminate each value that isn't preferred. Again, the determined preference between values can be configured by providing preference rules.

The new Niagara Falls input contains a value for category but the old input does not. Therefore, the target (aka right) value is preferred. Here is the full result from changes being applied:

Name Category Description Website Latitude Longitude ADA Condition
baseline Niagara Falls A world-famous trio of waterfalls straddling the Canada-U.S. border. 43.0815 -79.0642
target Niagara Falls Nature https://www.nps.gov/nifa True Fireworks tonight!
result Niagara Falls Nature A world-famous trio of waterfalls straddling the Canada-U.S. border. https://www.nps.gov/nifa 43.0815 -79.0642 True Fireworks tonight!

This first change contained no conflicts. Using the result as our new baseline, let's accept another user submission. Each property has a value, so we can specify preference rules to avoid troublesome conflicts.

class POIChangeSet(ChangeSet[POI]):
    def __init__(self, baseline: POI, target: POI):
        super().__init__(baseline, target,
            default=Preference.LEFT,  # Unless specified below, prefer the existing data
            ada_compliant=lambda b, t: False,  # Don't prefer a value of True unless confirmed by moderator
            latest_condition=Preference.RIGHT  # Always prefer the new condition value
        )

# Retrieve the existing Niagara Falls entry
saved_poi = poi_dao.get('Niagara Falls')

# Create an entry from the user's input
user_entry = poi_dao.create_with(
    name='Niagara Falls',
    category='Construction, avoid!',
    ada_compliant=False,
    latest_condition='Construction limits viewing platforms.',
    insert=False
)

ChangeSet(saved_poi, user_entry).resolve_preferences().apply()

With the defined preference rules, here are the results:

Name Category Description Website Latitude Longitude ADA Condition
baseline Niagara Falls Nature A world-famous trio of waterfalls straddling the Canada-U.S. border. https://www.nps.gov/nifa 43.0815 -79.0642 True Fireworks tonight!
target Niagara Falls 'Construction, avoid!' False Construction limits viewing platforms.
result Niagara Falls Nature A world-famous trio of waterfalls straddling the Canada-U.S. border. https://www.nps.gov/nifa 43.0815 -79.0642 False Construction limits viewing platforms.

Conflict Resolution

This next submission contains a value for description. We want that new description, but we don't want to overwrite the old one. Conveniently, a ChangeSet will automatically give a Preference of BOTH since both values are not the default value of None. This results in a conflict since we can only assign a single value to description. Without any additional input, a Conflict error will be raised. That error can be avoided by specifying a conflict resolution rule.

Tip

The Conflict error can also be avoided by overriding the default_conflict rule, setting it to Preference.BOTH. This will result in the field being labeled as Unresolved.

Conflict resolution rules are defined just like preference rules but must be suffixed with _conflict as seen below.

class POIChangeSet(ChangeSet[POI]):
    def __init__(self, baseline: POI, target: POI):
        super().__init__(baseline, target,
            default=Preference.LEFT,
            description_conflict='\n'.join,  # Join the descriptions into a single string
            ada_compliant=lambda b, t: False,
            latest_condition=Preference.RIGHT
        )

# Retrieve the existing Niagara Falls entry
saved_poi = poi_dao.get('Niagara Falls')

# Create an entry from the user's input
user_entry = poi_dao.create_with(
    name='Niagara Falls',
    description='Known for their immense power and breathtaking views!',
    insert=False
)

ChangeSet(saved_poi, user_entry).resolve_preferences().apply()
Name Category Description Website Latitude Longitude ADA Condition
baseline Niagara Falls Nature A world-famous trio of waterfalls straddling the Canada-U.S. border. https://www.nps.gov/nifa 43.0815 -79.0642 False Construction limits viewing platforms.
target Niagara Falls Known for their immense power and breathtaking views!
result Niagara Falls Nature A world-famous trio of waterfalls straddling the Canada-U.S. border.
Known for their immense power and breathtaking views!
https://www.nps.gov/nifa 43.0815 -79.0642 False Construction limits viewing platforms.

Most of the time, you can get away with defining a preference rule. But for more complex situations, a conflict resolution rule is more powerful. If rule definitions don't meet your needs, you can always override the resolve_conflict function for complete control.

Now that resolving preferences is understood, we can discuss the special cases when applying changes.

Applying Changes

Changes are applied by calling the apply() method on the ChangeSet. This is typically only done after calling resolve_preferences(). The apply() function calls get_resolution() for each field and assigns the value to the baseline object.

Info

The baseline object is the one that is modified when applying changes. This means that submitted said changes to the DB is as simple as calling dao.commit()

Most of the time, the determined resolution is simply the target value. However, it may also be labeled as Resolved or Unresolved.

Info

If the preference is determined to be LEFT, The resolve_preference function will delete the entry from the dict and no changes will be applied. To understand why this is, think about resolving code changes, if one of the changes is unwanted, that change is removed leaving just the original code as it was.

Here is a sample of how a ChangeSet may look under the hood, after resolving preferences but before applying changes:

changeset = {
    'category': (
        'Nature',
        Unresolved('State Park')
    ),
    'description': (
        'A world-famous trio of waterfalls straddling the Canada-U.S. border.',
        Resolved(
            'Known for their immense power and breathtaking views!',
            'A world-famous trio of waterfalls straddling the Canada-U.S. border.\nKnown for their immense power and breathtaking views!'
        )
    ),
    'website': (
        None,
        Unresolved('invalid-website.com')
    ),
    'latest_condition': (
        'Construction limits viewing platforms.',
        'Construction has completed'
    )
}

Resolved

In the previous sample, the description field is marked as Resolved. This means that the resolution is a different value from that of the baseline or the target.

When applying changes, the resolved value will be used as the new value.

Unresolved

category and website give us examples of unresolved fields. You will see Unresolved if the Preference was NEITHER or NOT_APPLICABLE. A field could also be unresolved if the Preference is BOTH even after attempting to resolve the conflict. In this case, we don't want an empty value or an invalid website, leaving that change Unresolved. We can still apply the changes if something is unresolved, but the object's value will then be set to an Unresolved object. You will have to account for this and handle it before being able to commit the changes to the database.

Tip

Resolving an Unresolved change will likely need some sort of human feedback since a standard rule could not determine an adequate result. This feedback may be in the form of submitting the change to a moderator for approval.

Again, once a ChangeSet is applied, we can now submit the changes to the database through a commit.

Tip

The SingleModelService is an extendable BaseService class for your DAOModel. It includes a merge function that uses a ChangeSet to merge data into a model entry. If you find this feature useful, that service may be a good starting point.

MergeSet

A ChangeSet is great, but we process many user submissions each day. A MergeSet will combine several model entries at once. This way we have visibility into all the potential values to determine the final result.

A MergeSet is configured in the same manner as a ChangeSet. The only difference is that the target is a list of model entries.

Here is an example of merging multiple user-submitted POIs into a baseline:

# Retrieve the existing Washington Monument entry
saved_poi = poi_dao.get('Washington Monument')

# Create entries from multiple user inputs
user_entry1 = poi_dao.create_with(
    name='Washington Monument',
    category='Obelisk',
    description='A 555-foot marble obelisk built to commemorate George Washington.',
    website='https://www.nps.gov/wamo',
    ada_compliant=True,
    latest_condition='Clear skies, excellent visibility from observation deck.',
    insert=False
)

user_entry2 = poi_dao.create_with(
    name='Washington Monument',
    category='Structure',
    description='One of the most recognizable structures in Washington DC.',
    latitude=38.8895,
    longitude=-77.0352,
    latest_condition='Limited tickets available for today\'s tours.',
    insert=False
)

user_entry3 = poi_dao.create_with(
    name='Washington Monument',
    description='Completed in 1884, it was once the tallest building in the world.',
    website='https://washington.org/monument',
    ada_compliant=False,
    latest_condition='Special evening lighting display tonight.',
    insert=False
)

user_entry4 = poi_dao.create_with(
    name='Washington Monument',
    category='Monument',
    description='Located on the National Mall between the Capitol and Lincoln Memorial.',
    ada_compliant=True,
    latest_condition='Park rangers offering special history talks hourly.',
    insert=False
)

user_entry5 = poi_dao.create_with(
    name='Washington Monument',
    category='Monument',
    website='https://www.doi.gov/washington-monument',
    latest_condition='Surrounding grounds open for picnics and gatherings.',
    insert=False
)

# Define custom preference rules for the MergeSet
class POIMergeSet(MergeSet[POI]):
    def __init__(self, baseline: POI, *targets: POI):
        super().__init__(baseline, *targets,
            # For category, prefer the deepest category from our taxonomy
            # i.e. 'Obelisk' is most preferred within the following taxonomy
            #   Structure
            #     └── Built Structure
            #           └── Commemorative Structure
            #                 └── Monument
            #                       └── Obelisk
            category=most_specific_from_taxonomy,

            # For description, we want to combine all unique descriptions
            # This requires a conflict resolution rule (_conflict)
            description_conflict=lambda values: (
                    '\n'.join(dedupe(exclude_falsy(values)))  # dedupe & exclude_falsy are from daomodel.util
            ),

            # For website, we prefer official .gov domains
            website=lambda values: (
                    first_str_with('.gov', values) or  # these functions are also available in daomodel.util
                    first(values)  # selects the first non-empty str
            ),

            # For coordinates, we want precise values
            latitude=lambda values: next((v for v in values if v is not None), None),
            longitude=lambda values: next((v for v in values if v is not None), None),

            # For ADA, trust the majority by taking the most common value
            ada_compliant=mode,  # the mode function may be imported from daomodel.util

            # For latest condition, we want the most recent information
            latest_condition=last  # Take the last submitted value
        )

# Create the MergeSet with baseline and all user entries
merge_set = POIMergeSet(saved_poi, user_entry1, user_entry2, user_entry3, user_entry4, user_entry5)

# Resolve preferences and apply changes
merge_set.resolve_preferences().apply()

Tip

The code example above contains several utility functions that can be imported and used in your own projects!

Let's examine how this MergeSet handles the different values from multiple submissions:

Name Category Description Website Latitude Longitude ADA Condition
baseline Washington Monument A 555-foot marble obelisk honoring the first U.S. president. 38.8895 -77.0352
target 0 Washington Monument Obelisk A 555-foot marble obelisk built to commemorate George Washington. https://www.nps.gov/wamo True Clear skies, excellent visibility from observation deck.
target 1 Washington Monument Structure One of the most recognizable structures in Washington DC. 38.8895 -77.0352 Limited tickets available for today's tours.
target 2 Washington Monument Completed in 1884, it was once the tallest building in the world. https://washington.org/monument False Special evening lighting display tonight.
target 3 Washington Monument Monument Located on the National Mall between the Capitol and Lincoln Memorial. True Park rangers offering special history talks hourly.
target 4 Washington Monument Monument https://www.doi.gov/washington-monument Surrounding grounds open for picnics and gatherings.
result Washington Monument Obelisk A 555-foot marble obelisk built to commemorate George Washington.
One of the most recognizable structures in Washington DC.
Completed in 1884, it was once the tallest building in the world.
Located on the National Mall between the Capitol and Lincoln Memorial.
https://www.nps.gov/wamo 38.8895 -77.0352 True Surrounding grounds open for picnics and gatherings.

This example shows how a MergeSet could handle complex merging scenarios with multiple user submissions, applying different preference rules and conflict resolution strategies based on the specific requirements of each field.

If you have your own utility functions that you use often and would like added to the library, submit a ticket to let me know, I would be happy to add it!