By taggy files I mean “embedded xml or html content” that is written into an Excel file alongside translatable text. In the last article I wrote I documented a method sometimes used by people to handle tagged content in a Word file… funnily enough I came across a Word file containing the XML components of an IDML file today and I guess it must have been prepared in a very similar way judging by the enormous number of tags using the tw4win style to hide them when opened by any SDL Trados version! Proof for me that this practice is sadly alive and well. But I digress… because this time I want to cover how to handle a similar problem when you find HTML or XML tagged content in an Excel file. This crops up quite a bit on ProZ so I thought it might be better to document it once and for all so I have something else to refer to in addition to the Studio help.
Studio uses a concept when creating custom XML files of parsing the file again based on the document structure type of an XML parser rule and replacing patterns you create in the parsed text with tags. Now let me say that again in English… Studio can look in the content of the text that is extracted for translation and then pick out the bits you don’t want to see and convert them to tags. So for example, if you had an Excel file that contained things like this:
And then you opened this file in Studio you would see something that looked just like the Excel spreadsheet but what you would probably prefer is what it can be changed into as shown below:
So you want to protect all the angle brackets and text between them. Just in case you don’t like to see all of this in wysiwyg mode don’t forget that you don’t have to. You can change the font sizes as shown by Kevin Lossner and Jayne Fox in a neat little video, or you can also select the default to always show you consistent plain text, and all tags (because we know they are really there even in wysiwyg mode!) all the time with this option here… so plenty of choice to suit your preferences:
Of course you also don’t have to convert the plain text excel file into the crazy formatting I showed here!
But the important thing is that we have converted all of the tagged content in the Excel file into protected tags in Studio so that you can safely translate the text alone. How do you do this… easy!
You just create some rules, using a little regex, to pick out the text that should be tags. These rules are all added through the Excel filetype settings for XLS and XLSX filetypes in here (the screenshot shows XLSX):
So the process is to first enable the “Embedded Content Processing” in 1. by ticking the box, and then selecting “Cell” from the list of available types. This is because for Excel the ONLY one that works is “Cell”. The rest are all part of the available types when you use the same “Embedded Content Processor” in a custom XML filetype, but they have no effect in the Excel filetype. It makes sense when you think about it as we are dealing with “Cells” in Excel… but it’s not the most intuitive part of this solution.
Once you have enabled the processing you can add your rules as I have in 2. I was a little flamboyant with them in this case just to show you what could be done if you wanted… I could have converted all of the tags in this file with three rules… maybe less if I was really clever. In reality, most Excel files I see translators having problems with only contain quite simple XML/HTML and in these cases the first catch all rule below will probably handle the complete file:
TRANSLATABLE TAG PAIR - CATCH ALL <[a-z][a-z0-9]*[^<>]*> </[a-z][a-z0-9]*[^<>]*> PLACEABLES {[0-9]} Alt attribute <.*alt=" ">
The interesting thing is that in my actual example, by getting a little flamboyant I have actually shown how simple it can be because I have just taken the literal text that formed the tags and added these as rules. For example, I don’t want <b></b> tags to be text. So I add them in as a translatable tag pair here:
Quite simple when you look at it like this… but the drawback is that you need to add a rule for every type of tag in the file which is what I did to create the colourful view above. If you have a lot of different tags and it’s a big file (or lots of files) then the slicker regex rule is much better and it may well be all you need to catch all the tags:
Once you have added all your rules, and made them as fancy as you like, you can open the Excel file and all being well you’ll see protected tags, or a fancy wysiwyg format to handle the file.
Just to finish off… the same file displayed using the “no wysiwyg” option I mentioned above will show as follows even if I have set all the fancy rules I did. The segments that don’t show any tags are like this because the tags are actually at the start and end of the cells, so they are not required. If I did want to see them (and have to deal with them) this is also possible by changing them to be internal rather than external in the advanced rules as you add the regular expressions: