Monday, March 25, 2013

tool to check web-site translation automatically

Now I work for a project of localized version of financial web-site.

The web-site literally have two parts: front-end (drupal) and back-end(.net). Some portions of view-able text is stored on the front-end inside html body, some in js scripts, some in front-end database and some is given from back-end. Besides for each country only some of messages (thousands) are shown.

 On the web-site nearly 100 devs are working, and when they apply changes to one country another countries could be affected as well. Partly to new bugs, partly because some point were not translated yet. As a result from time to time on some pages foreign text appears.

 From business point of view foreign text should not be show at all. As for stage one I've implemented a specialized web-crawler based tool (

 Literally it consists of a web-crawler crawler4j, includes in row-text format dictionaries of existed languages (extracted from aspell), and works as described in

 Next steps:
1. Include support of selenium tests
2. Add possibility to hide notes on a page, or some portion of a text on a site.

 Everyone is welcomed to help with the project, suggest new ideas etc.