SharePoint Translation Service for Office Documents and PDFs

On February 14, 2024, Microsoft published details of the SharePoint Online translation service, one of the capabilities available through the SharePoint Premium license. The article included a nice offer for tenants to test SharePoint Translation with a promotion available until the end of June 2024.

The offer covers translation of the first one million characters at no cost every month. After this point, the translation service costs $15.00 for 1 million characters. Microsoft estimates that 1 million characters to be roughly 500-750 pages depending on the density of the text. Like Microsoft 365 Backup and other SharePoint add-on services, charging is on a pay-as-you-go (PAYG) basis billed through a monthly Azure subscription. Before using a PAYG service, you must configure the Microsoft Syntex service settings to link the Azure subscription. After that, all you need to do is decide which sites to make translation available. By default, translation is available for all sites, but you can restrict the facility to up to 100 specific sites (Figure 1).

SharePoint Translation Settings
Figure 1: SharePoint Translation Settings

According to the documentation, the supported file types include csv, .docx, .htm, .html, .markdown, .md, .msg, .pdf, .pptx, .txt, and .xlsx. Legacy Office formats are supported too. SharePoint Translation creates the translated versions of files in these formats in the equivalent “modern” format (for example, .doc becomes .docx). The maximum supported file size for translation is 40 MB.

Like many announcements for Microsoft 365 services, it takes time for all the necessary bits to show up in a tenant. I was able to see a Translate option for documents a few weeks ago, but taking the option did nothing. It’s only in the last week or so that translation works.

Translating Word Documents

I work on multiple documents daily, mostly to write articles and chapter files for books. All the files are formatted for U.S. English because that’s the language used for publication on websites or in eBooks. Most articles are Word documents of between 800 and 1,500 (2 to 5 pages). Translation of these files is easy – select the document, choose the Translate option from the […] menu, and pick a target language (Figure 2).

Selecting a target translation language
Figure 2: Selecting a target translation language

The drop-down menu shows a list of the most popular languages. It’s possible to translate to any of the supported languages by specifying the language’s ISO code (for example, ca for Catalan or da for Danish). If SharePoint search supports the language, it expands the language code and allows you to submit the file for translation.

Usually, translation happens very quickly. Sometimes things didn’t happen quite so fast, and I had to wait for several hours before a translated file appeared. And sometimes, translation simply didn’t work. You can understand that background processes running in a cloud service sometimes take longer than expected, but it’s disconcerting when nothing happens after submitting a file for translation.

When everything works, the translated file appears in the document library where the source document is stored. The translated file has the same name as the source document appended with the ISO code for the target language. For example, the file for the French language translation for a document named Translating SharePoint Online Documents.docx is Translating SharePoint Online Documents_fr.docx. The translated file inherits the properties of its source, including retention and sensitivity labels (if applied).

No Translation for Protected Files

Speaking of sensitivity labels, translation doesn’t work if the label applied to a file includes protection (encryption). Files with labels marking information with a certain sensitivity can be translated, but once encryption is involved, translation doesn’t work. Even though SharePoint Online stores protected files in an unprotected form and only applies encryption when a file is downloaded, this is entirely logical. To translate a file, SharePoint downloads it and sends it to the translation service. The sensitivity label that controls access to the file doesn’t include an access right for the translation service, so the service can’t open and process the content. The same restriction applies to password-protected files. The bad thing is that users can submit protected files for processing without any indication thereafter that translation is impossible.

Apart from noticing the presence of a newly translated file, users or administrators don’t receive any other notification indicating that the translation was successful. It would be good if users could decide if they should receive notifications via email, especially when large documents or large quantities of documents are processed.

Translated Output

Translation of Word documents preserves headings, text, headers, footers, and other elements (Figure 3). Some overflowing of text across pages might happen due to translation using different words or number of words. This is especially obvious when a document includes graphics that may no longer fit on a page because of added words. It’s therefore necessary to check each page to make sure that formatting flows as intended.

A Microsoft Word document translated into Catalan
Figure 3: A Microsoft Word document translated into Catalan

Translating Very Large Documents

I tried to stress SharePoint Translation by translating the Word document for the current version of the Office 365 for IT Pros eBook to French (where the title becomes “Office 365 pour les professionnels de l’informatique”). The translation took about ten minutes to process the 31.7 MB file. Figure 4 shows the Word statistics for the English and French versions of the document. You can see that the count of pages, words, and characters increased in the French version.

Word count statistics for a source English and target French document
Figure 4: Word count statistics for a source English and target French document

The English version of the source Word document contains about 4 million characters. The sheer size of the translated file highlighted the necessity to check the formatting of the converted document, notably the flow of paragraphs across pages. Small things matter when it comes to formatting, and I couldn’t understand why the translated version had random spaces inserted into sentences at times (Figure 5).

Extra spaces inserted into French text by SharePoint Translation.
Figure 5: Extra spaces inserted into French text by SharePoint Translation

More importantly for technical content like the Office 365 for IT Pros eBook, SharePoint Translation messes with PowerShell and other code. It’s hard to blame the translation algorithm because it essentially processes words and code is composed of words, albeit some strange words that are sometimes arranged in strange ways. Take this example of a PowerShell snippet where Translation changed cmdlet and parameter names. One thing’s for sure: PowerShell will barf if given this code to run.

Get-UnifiedGroup -ResultSize Illimité | Objet de tri DisplayName | Sélectionner un alias d'objet, DisplayName, ManagedBy, AccessType, GroupMemberCount, GroupExternalMemberCount, WhenCreated | export-csv -chemin c :\temp\TenantGroups.CSV -noTypeInformation -encodage ascii
Invoke-Item C:\temp\TenantGroups.CSV

Another thing afflicting the translated output for code examples is when languages use different quotation marks. Take this example, which won’t work either:

Get-UnifiedGroup -Identity « BankingGroup@office365itpros.com »

Audit Records

The SharePoint Translation service does not create audit events when files are submitted for translation nor when the processing of the files succeeds or fails. However, SharePoint Online logs events when translated files are uploaded. Typically, three events appear:

  • FileUploaded: The app@sharepoint account uploads a translated file. The file has a temporary file name.
  • FileRenamed: The temporary file name is replaced with its permanent version.
  • FileModified: One or more records are captured when SharePoint updates the properties of the temporary file to match its source.

Here are some details extracted from the unified audit log to show the contents of the three audit records captured for a file translation:

UserPrincipalName : app@sharepoint
Timestamp         : 21-Mar-2024 12:23:41
fileName          : ~tmp1B_Office 365 for IT Pros 10 - April 2024_fr.docx
Operation         : FileUploaded
ObjectId          : https://office365itpros.sharepoint.com/sites/O365ExchPro/Shared Documents/Book Files/2024 Edition Book Files/~tmp1B_Office 365 for IT Pros 10 - April 2024_fr.docx

UserPrincipalName : app@sharepoint
Timestamp         : 21-Mar-2024 12:23:51
fileName          : ~tmp1B_Office 365 for IT Pros 10 - April 2024_fr.docx
Operation         : FileRenamed
ObjectId          : https://office365itpros.sharepoint.com/sites/O365ExchPro/Shared Documents/Book Files/2024 Edition Book Files/~tmp1B_Office 365 for IT Pros 10 - April 2024_fr.docx
Application       : Media Analysis and Transformation Service

UserPrincipalName : app@sharepoint
Timestamp         : 21-Mar-2024 12:23:57
fileName          : Office 365 for IT Pros 10 - April 2024_fr.docx
Operation         : FileModified
ObjectId          : https://office365itpros.sharepoint.com/sites/O365ExchPro/Shared Documents/Book Files/2024 Edition Book Files/Office 365 for IT Pros 10 - April 2024_fr.docx

The audit records don’t tell us who submitted documents for translation. They only inform about the successful outcome of translation attempts.

PowerPoint and PDFs

Given the success of translating large Word documents, I decided to test with a PDF generated by Word from those files. The input was a 32.1 MB file without a sensitivity label. No matter what I did, all attempts to translate the PDF failed. Attempts with smaller PDFs succeeded, so I wonder if a different size limit applies to PDFs than to Office files.

I also tried translating PowerPoint presentations. Although successful, careful review of the output slides is necessary because the formatting problem reappeared when text overflowed and interfered with the placement of graphics.

Translation Costs

Over a day or so of testing translation against Word documents, PowerPoint presentations, and PDFs, I accumulated about $82 (EUR75.19) of charges for my Azure subscription. Azure accrued the charges against a resource called microsoft.syntex/documentprocessorsresource. This is the same resource name used by Microsoft 365 Backup. To get more insight into the charges for the translation service, I had to consult the invoice details view (Figure 6).

SharePoint Translation charges logged against an Azure subscription
Figure 6: SharePoint Translation charges logged against an Azure subscription

At the list price of $15 per million characters, I guess I must have translated about 6.5 million characters (one million free, the remainder paid). Given that Microsoft 365 Backup could be described as a mission-critical application, it’s curious that translation costs so much more than the monthy charge for Microsoft 365 Backup in my tenant.

No further details for translation charges are available from the Azure portal. I have no idea how much it costs to translate individual documents or if Microsoft charges for failed translation attempts.

Some of the documents I translated were large and I expected to see some charges, but not the billed invoice. Clearly, I tested enthusiastically. Joking aside, this experience points to the need to restrict the translation option to specific sites where translation is a business need rather than a nice to have.

Translation Good for Standard Office Documents

I suspect that the target for SharePoint Translation is unlikely to be documents that contain code examples. More likely, Microsoft is aiming for a more “standard” form of Office documents. In any case, the current offer is a great opportunity for tenants who operate in multinational environments to test SharePoint Translation free of charge. By submitting a variety of documents for translation, you’ll be able to identify if and where issues exist.

Informing users about the progress (or lack) of translations is an area where I think Microsoft could improve. They could also do better at identifying the nature and detail of charges accrued against the Azure subscription so that organizations can make internal chargebacks or simply track consumption better.

Overall, the bottom line is that if you need to translate documents, SharePoint Translation is a great way to generate a starting point for human translators to complete the task.

About the Author

Tony Redmond

Tony Redmond has written thousands of articles about Microsoft technology since 1996. He is the lead author for the Office 365 for IT Pros eBook, the only book covering Office 365 that is updated monthly to keep pace with change in the cloud. Apart from contributing to Practical365.com, Tony also writes at Office365itpros.com to support the development of the eBook. He has been a Microsoft MVP since 2004.

Comments

  1. Gustavo

    Wish there were the possibility to monitor the usage Microsoft Syntex translation.

Leave a Reply