Critical Weblate Bug Pending Translations Lost During Repository Sync

by ADMIN 70 views
Iklan Headers

Hey guys, let's dive into a critical issue that can seriously impact your translation workflow in Weblate. We're talking about a bug that can lead to the silent loss of pending translations during repository synchronization. Imagine translators putting in the effort, making changes, and then poof! Their work vanishes without a trace. Sounds scary, right? Let's break down what's happening, why it's happening, and what can be done about it.

The Silent Translation Killer Flawed needs_commit_upstream() Logic

The heart of the problem lies in the needs_commit_upstream() method within Weblate. This method is supposed to determine whether changes need to be committed to the upstream repository before syncing. However, it has a crucial blind spot it only checks for file changes in the Git repository itself. It completely ignores those precious pending translation changes sitting in Weblate's database, patiently waiting to be committed.

  • The core issue here is that Weblate overwrites uncommitted local translation changes when syncing from the Git repository. This happens due to the flawed needs_commit_upstream() logic. Essentially, if you've made changes in Weblate that haven't been committed to Git yet, and a repository sync is triggered, your work can be wiped out. This is a major problem, especially in collaborative translation environments.

Steps to Translation Oblivion

Let's walk through a scenario to illustrate how this data loss occurs:

  1. A user, let's call her Alice, diligently translates a phrase in Weblate, perhaps changing "RuffLace" to something more elegant like "ExcRuffle." Her change is saved in Weblate's database, marked as pending=True because it hasn't been committed to Git yet.
  2. Now, a repository sync is triggered. This could happen automatically via a scheduled task or manually by an administrator.
  3. The needs_commit_upstream() method kicks in, but it only peeks at the Git files. Since no Git files have been directly modified, it returns False, blissfully unaware of Alice's pending translation.
  4. Weblate, trusting this faulty assessment, skips the commit step and proceeds to sync directly from the Git repository.
  5. The translation files are pulled from Git, effectively overwriting the changes in Weblate's database.
  6. Poof! Alice's translation is gone, lost in the digital void. Heartbreaking, right?

The Expected Savior Weblate's Ideal Behavior

Ideally, Weblate should act as a guardian of our translations, protecting uncommitted changes during repository sync. Here's what we'd expect:

  1. A thorough check for pending translations in the database before initiating a sync.
  2. Automatic commitment of these pending changes before any repository updates occur.
  3. A firm rule never to overwrite translator work that hasn't been safely pushed to Git.

The Harsh Reality Actual Behavior

Unfortunately, the current reality is quite different. Pending translation changes are silently wiped out during repository sync, leaving translators frustrated and demoralized. There's no warning, no recovery mechanism, just a quiet disappearance of valuable work.

Digging Deeper Root Cause Analysis

The culprit is, as mentioned earlier, the needs_commit_upstream() method in weblate/trans/models/component.py. Let's take a closer look at the code snippet:

def needs_commit_upstream(self) -> bool:
    """Detect whether commit is needed for upstream changes."""
    changed = self.repository.get_changed_files()  # Only checks Git repository
    if self.uses_changed_files(changed):
        return True
    # ... but never checks for pending translation changes in database!

This method is laser-focused on Git repository file changes. It completely overlooks the pending translation units residing in Weblate's database, the ones that haven't yet made their way into files. This oversight is the root of the problem.

Real-World Evidence Logs Speak Volumes

In a production environment, the impact of this bug is clear. Imagine this scenario:

  • The last successful commit to the repository was 21 hours ago at 18:29:24.
  • Our translator, let's call him Zhang, diligently modifies a term 20 hours ago.
  • A repository sync kicks off 18 hours ago at 20:30, unknowingly overwriting Zhang's pending change.
  • The next commit doesn't happen until 4 hours ago at 11:24:08 a 17-hour gap. This gap screams disruption caused by the sync.

A Potential Lifeline Suggested Fix

To remedy this critical flaw, the needs_commit_upstream() method needs a little tweaking. We need to make it database-aware, so it considers pending translations. Here's a suggested modification:

def needs_commit_upstream(self) -> bool:
    """Detect whether commit is needed for upstream changes."""
    changed = self.repository.get_changed_files()
    if self.uses_changed_files(changed):
        return True
    
    # Check for pending translation changes in database
    if self.pending_units.exists():
        return True
        
    for component in self.linked_childs:
        if component.uses_changed_files(changed) or component.pending_units.exists():
            return True
    return False

This enhanced version adds a crucial check for pending_units.exists(). Now, the method will return True if there are pending translation changes in the database, ensuring that these changes are committed before a sync occurs.

The Ripple Effect Impact of Data Loss

This bug isn't just a minor annoyance; it's a critical data loss bug with far-reaching consequences:

  • Lost translator work and productivity: Translators spend valuable time and effort making changes, only to see them vanish. This leads to frustration and wasted resources.
  • Decreased trust in the platform: When users can't rely on the system to preserve their work, trust erodes. This can lead to decreased engagement and adoption.
  • Potential compliance issues for professional translation workflows: In professional settings, data loss can have serious legal and financial implications.
  • Silent failures without user notification: The worst part is that this data loss happens silently. Users aren't warned, and there's no easy way to recover the lost work.

Environment Context Matters

  • Weblate version: This issue has been observed across multiple versions, indicating a fundamental architectural problem.
  • Deployment: The bug affects all Weblate installations using automatic repository sync.

Additional Pressure Factors Contributing Circumstances

This issue is particularly problematic in scenarios where:

  • commit_pending_age is set to longer periods (the default is 24 hours), meaning changes can sit uncommitted for a while.
  • Automatic repository sync is enabled (AUTO_UPDATE), triggering frequent sync operations.
  • Multiple translators are working on the same component, increasing the likelihood of conflicts.
  • Integration with external Git repositories is used, adding another layer of complexity.

The hourly sync schedule (update_remotes task) exacerbates the problem because it can trigger before the 24-hour commit timeout expires, leading to premature overwrites.

Conclusion Let's Fix This!

This data loss bug is a serious issue that needs immediate attention. By modifying the needs_commit_upstream() method to include a check for pending database changes, we can safeguard translator work and ensure a more reliable translation workflow. Let's get this fix implemented and give our translators the peace of mind they deserve!

No screenshots or exception traceback were provided in the original issue.

How We Run Weblate

Weblate is running in a Docker container.

Weblate Versions Affected

Weblate version 5.10.4 is affected, and likely other versions as well.

Weblate Deploy Checks

No output from Weblate deploy checks was provided in the original issue.

Additional Context

No additional context was provided in the original issue.