Request Handling

In the Translation Proxy, frontend instances are responsible for serving translated pages. Thanks to AppEngine’s quick-reaction scaling, the number of frontend instances aggressively follows (and somewhat predicts) demand, keeping latency low. The general life cycle of a proxy request can be described as follows.

  • The incoming requests, based on the domain name, reach the Google Cloud (rerouted via DNS record CNAME, pointing to ghs.domainverify.net.
  • Based on the domain name and the deployed Proxy application, AppEngine decides that this specific request should be routed to the Proxy AppEngine deployment.
  • The request reaches the Proxy Application internally; the application does a lookup against the domain for the associated project. There are special domain names, and the final serving domain, for which caching is activated.
  • Based on the URL, the Proxy application determines the matching Page in the Proxy database. The database has a list of segments, pointing to our internal Translation Memory (TM). We retrieve all these existing Database entries, including the translations for the given target language.
  • The Proxy application processes the incoming URL request, and transforms it to point back to the original site’s domain. Then, the source content of the translation is sourced, according to cache settings in effect on the project.
    • If source caching is disabled, the application issues a request, and retrieves the result from the original web server, which is hosting the original website language.
    • If source caching is enabled, a local copy (a previously stored version of the source HTML) is used, instead of issuing a request to the original web server.
  • Depending on the Content-type of the response, the appropriate Translator is selected, and the response is passed to an instance of the Translator as a document. The behavior of the Translator can be affected by cache settings as well.
    • If binary caching is disabled, the application then builds the Document Object Model (DOM) tree of the result, finally iterates through all the block level elements, and matches them against the segments loaded from the database. If there’s a match, we replace the text with the translation. If not, we ‘report’ it as a missing translation.
    • If binary caching is enabled and the hash of the source HTML matches the one stored in the cache, a previously prepared and stored translated HTML is served.
    • If binary caching and keep cache are both enabled, and the hash of the source HTML doesn’t match the one stored in the cache, the proxy translates the page using the TM. If the number of translated segments is higher than the previously prepared and stored translated HTML, the new version is served; otherwise the old one. (Keep cache can be thought of as a “poor man’s staging server”).
  • Hyperlinks in the translated content are altered by the LinkMapper to point to the proxied domain instead of the original. This affects all href or src attributes in the document equally, unless the element is given the __ptNoRemap class. At this point, resources may be replaced by their localized counterparts on a string-replacement basis.
  • The application serializes the translated DOM tree, and writes it to the response object.
  • Before final transmission takes place, the Proxy may rewrite or add any additional headers, such as Cache-control or Pragma.
  • Finally, the Proxy serves the document as an HTML5 stream, as a response to the original request. AppEngine must close the connection once the response is transmitted, so proxying streaming services is not possible in this fashion!