I’ve written about how to handle bilingual excel files, csv files and tab delimited files in the past. In fact one of the most popular articles I have ever written was this one “Creating a TM from a Termbase, or Glossary, in SDL Trados Studio” in July 2012, over three years ago. Despite writing it I’m still struggling a little with why this would be useful other than if you have been given a glossary to translate or proofread perhaps… but nonetheless it doesn’t really matter what I think because clearly it was useful!
So, why am I bringing this up three years later? Well, the recent launch of Studio 2015 introduced a new filetype that seems worthy of some discussion. It’s a Bilingual Excel filetype that allows you to handle excel files with bilingual content in a similar fashion to the way it used to be possible in the previous article. There are some interesting differences though, and notably the first would be that you won’t lose any formatting in the excel file which is something that happened if you had to handle files like these as CSV or Tab Delimited Text. That in itself mught be interesting for some users because this was the first thing I’d hear when suggesting the CSV filetype as a solution for handling files of this nature. Most of the time I don’t think this is really an issue but for those occasions where it is this is a good point.
But this new filetype is more than just an Excel version of the old one. So let’s just take a look at the options using this excel layout as an example:
So I have five columns of text, with the source and target in columns B and C, the name of the character playing the part (it’s a film script) in column A, a maximum character length for the text in column D and some notes in column E. The text is also partially translated.
Columns
In addition to the usual source and target column I have a couple of other options.
I can set a maximum number of characters that are allowed in the target. This is quite useful because sometimes, particularly with gaming scripts where the text box is a limited size, it’s important for the translator to know how many characters are allowed. So here, if you use this option the standard QA Checker in Studio can use this and flag something like this if you go over the limit:
You can also check the allowable length at any time by clicking on the document structure column on the right hand side. If you don’t have the context information populated (see below) then the righthand column in Studio will say LN (for Length Restriction ;-)) but if you do, as I do in this example, then it may use a different code with a plus symbol indicating there is more than one code in there. So in my example it says ACT+:
The checkbox “Preserve Target Style” allows you to apply the style of the target cell in Excel to the target translation rather than overwrite with the style of the source cell. So just giving you another option for handling formatting in the Excel file.
Exclude
In here we have another new option compared to the CSV filetype, and that’s “Translation column content“. If you check this then any of the cells that have been translated in the Excel file already will be ignored. So if you do check this then the options in the next part of the settings will not apply:
Existing Translations
These options were already available in the CSV filetype and are quite useful because they can save you having to deal with existing translations at all, and more importantly using the locking option allows you to exclude these segments from the analysis:
Context and Comments
We had Comments availability in the previous CSV filetype too, but there the comments were added to the document structure window. Useful but hard to get at as you needed to click on the document structure column to see the available information and you only saw one cell at a time.
In this filetype the comments can be displayed as Studio comments like this which allows you to see more at a time and to read them without having to click on anything at all. In fact if you have a lot of comments and they are needed to provide important translation context then moving them to a window on the side can be very useful and easy to use. If you don’t know how to move windows take a look at this article:
The Context Information column is useful because it provides a good way ot tracking string IDs, or any other information which might be useful to know as you work. In this example I used the name of the characters in the film. These are in column A of my spreadsheet and they are displayed in the Document Structure Column as noted above in the section on Columns.
Where is it?
Perhaps one little thing I forgot to mention and that’s where it is. This is quite important to note because the default settings for Studio are like this with all three types of Excel filetype checked:
Studio uses the filetypes on a first come first served basis depending on information in the filetype settings. So if you want to use the Bilingual Excel filetype you need to either disable the Microsoft Excel 2007-2013 filetype or just move the Bilingual Excel filetype so it sits above the others in the list. I guess if you do a lot of these and also work with Excel then you could create project templates that allow you to simply select the appropriate one to match the filetype you’re working with and this would save you having to mess around with which one is active and taking priority in the list.
So all in all quite a useful filetype. There is no preview with this, but in many ways it doesn’t feel as though it needs one as the layout of Studio is very similar to the sort of files you are likely to be handling with this filetype and hopefully there are enough options to include the contextual information from the file to help anyway. But before I end I thought it might be interesting to share a little translation conundrum that was posted on ProZ a few weeks ago where Excel and this new filetype could be used to solve it; this is the stuff!
Stuff…
Excel is an interesting format for many things, so I thought I’d share an little problem that appeared on ProZ a few weeks ago. There are many ways to handle this but I thought it might be fun to share a way to tackle it using the Bilingual Excel filetype… and I’m not trying to start a war over whose tools handle it the best… this is just some excel stuff I thought would be fun to share. Since the original idea and reading what some of the other solutions are I’d probably handle this using regex in EditPadPro to get the text out anyway. But I like this because it’s just Excel and Studio.
The problem was how to create a TMX translation memory from an SGML file that was formatted something like this (you can see the full text in the ProZ post and the video at the end):
<doc id='N0001'> <head> <title>What is a Fenqing ?</title> <corpus url='http://code.google.com/p/evbcorpus/'>EVBCorpus</corpus> <author attributes='stuff in here'>name</author> <citation>"Building a Bilingual Corpus for MT"</citation> </head> <text> <spair id='1'> <s id='en1'>What is a Fenqing ?</s> <s id='vn1'>Fenqing là gì ?</s> </spair> <spair id='2'> <s id='en2'>Fenqing is a Chinese word which literally ...</s> <s id='vn2'>Fenqing là một từ tiếng Hoa mà nghĩa đen...</s> </spair> </text> </doc>
So here’s one way to do it!
Create an XML filetype for this SGML… pretty simple using just two rules (if you don’t know how to do it this article might help but you can also watch the video as I explain it in there):
//s (always translatable)
//* (Don’t translate)
So these rules extract the translatable content in the s element and nothing else. There is no distinction between English or Vietnamese at this stage as I have ignored the language attributes altogether. Next I just open the SGML file in Studio and save it. Now I have an SDLXLIFF with source/target repeated in the source column only throughout the file.
Now I can use the SDLXLIFF Converter for MSOffice (installed with Studio since 2011) and convert the SDLXLIFF to Excel. If you didn’t know this was possible take a look at this article.
The result of this operation is that I now have an excel file with an ID column, a source column (populated), a target column (empty) and an empty notes column.
Now comes the fun excel part. I can use this formula in the target column:
=IF(ISEVEN(A3),B3,””)
The ISEVEN function in excel is a neat formula that lets you check whether numbers are odd or even. You probably see where I’m going with this now.
This formula will look at the ID column (column A) and check if’s an even number or not. If it is then it will copy the contents into the active cell. If it’s an odd number it puts nothing at all. Once I’ve done this I can copy the formula down the spreadsheet, copy all of column C (target column) and paste it as plain text to remove the formulae.
Now I have a spreadheet with every other row containing source on the left and target on the right. So I can filter on the target column and sort it in alphabetical order. Now I just delete all the rows with nothing in the target.
This leaves me with a simple spreadsheet I can drag into the Glossary Converter and convert to TMX which resolves the question asked by the user. However, seeing as I am more likely to want to use this Translation Memory in Studio I won’t do that. Instead, I just open the excel file with the Bilingual Filetype and then update it straight into a Studio Translation Memory. Piece of cake!!
If you want to see this in realtime, and the video is only 10 minutes long then you can see this in the video below… I hope it’s useful and perhaps gives you a few ideas of how excel can be useful for data manipulation especially since we have the new Bilingual Excel filetype:
Video is 10 minutes 8 seconds