LibreOffice/Collabora Online Typography
Line break interoperability and state-of-the-art ISO OpenDocument/web typography in open source office suites
LibreOffice, the open source office suite is the reference implementation of ISO OpenDocument (ODF) format. As part of LibreOffice Technology, Collabora Online is key for the digital sovereignty of open source online document editing. Metrically equivalent fonts cannot guarantee MS Word-interoperability any more because of the undocumented changes in MS Word line break algorithm after ODF and OOXML standardization. The planned project solves this fundamental problem, adding also the the following interoperability and CSS4 web typography features and fixes to LibreOffice line break algorithm and layout:
•.tdf#119908 Fix line break interoperability of LibreOffice and Collabora Online in the form of a new default paragraph layout, which is equivalent of the new and undocumented algorithm of MS Word 2013 and later;
•.tdf#132599 Implement ODF attribute fo:hyphenation-keep, add documented interoperability and state-of-the-art CSS4 web typography feature “stop words hyphenating across pages” to LibreOffice and Collabora Online;
•.tdf#106733 Implement ODF attribute fo:hyphenate to exclude a portion of text from hyphenation;
•.tdf#149421 Fix non-implemented details of the hyphenation zone interoperability.
All the developments are libre and open source, and will be integrated with the main development branch of LibreOffice, and released with its next stable release.
Layout difference of MS Word (upper) and LibreOffice (bottom): line break in justified paragraphs supports shrinking of spaces between words since MS Word 2013. Only disabling the justification results in the same layout, moving the word “Antarctica” to the second line in MS Word, too. The bottom paragraphs contain hundreds of adjacent spaces shrinking only in MS Word.
With 0.1 pt resolution, up to 2% shrinking was measured for plain justified lines (no direct character formatting, no hyphenation). The following picture shows, how the shrinking is implemented in MSO by using the available spaces (but the maximal shrinking is independent from the number of the spaces – at least in a line with enough spaces):
(Click on the image to show more details. Black text: MS Word, red text: LibreOffice Writer – the black text contains an extra character spacing after the text “3 Au” to give the same position after the last space).
Commit 7d08767b890e723cd502b1c61d250924f695eb98 “tdf#130088 tdf#119908 smart justify: fix DOCX line count + compat opt.” is the fix for the line count/page count differences and the initial fix for the new justification:
•.It recognizes the DOCX documents with the new default MSO 2013+ justification;
•.fix line/page count in import of DOCX documents with MSO 2013+ justification: break lines at same positions in plain text, resulting rendering consistency (no more lines and pages) in LibreOffice/Collabora Online importing DOCX files with the new default justification, but with exceeding lines yet;
•.add new LibreOffice compatibility option “JustifyLinesWithShrinking” to enable line break interoperability for DOCX documents, and store this setting in ISO OpenDocument, native document format of LibreOffice, too;
•.and add unit test for these.
(MSO, Writer before the fix, Writer after the fix. Click on the image to show more details. Blue marks: correct line break positions in MSO and improved Writer, red marks: bad line breaks in Writer before the fix).
Shrinking has been added by commit 17eaebee279772b6062ae3448012133897fc71bb “tdf#119908 sw smart justify: fix justification by shrinking” and commit c1803de8a093739d189be54b2d9bd5634e9e79ee “tdf#119908 sw smart justify: add unit test”, fixing the temporary exceeding lines, i.e. justifying them.
The following composite picture shows the previous (red), and the shrank (black) lines in Writer. With this commit, the result is the same, as in MS Word. (Click on the image to show more details.)
The following composite screenshots of MS Word (red) and Writer (black) show the fix of handling multiple text portions (middle), and the shrinking algorithm (right), which resulted the same line breaks, as in MS Word (test document: lorempage.docx of Bug 158333, generated PDFs: Word, Writer). (Click on the image to show more details.)
The related commits in LibreOffice code base:
Commit | Description |
tdf#158333 sw smart justify: fix multiple text portions Multiple text portions, e.g. if some part of a line contains direct character formatting breaks DOCX interoperability of justified paragraphs. | |
tdf#119908 tdf#158419 sw smart justify: fix cursor position Text cursor didn't follow the new word positions yet, because of unsigned casting of the negative shrinking value. | |
tdf#119908 tdf#158436 sw smart justify: fix freezing with NBSP Stop shrinking during underflow, because it resulted endless layout loop, e.g. when a very short word followed by a no-break space. The problem reported by Miklós Vajna (Collabora Productivity). | |
tdf#119908 tdf#158776 sw smart justify: shrink only spaces For interoperability, only shrink spaces up to 20%, not the lines up to 2%. | |
tdf#159102 sw smart justify: fix automatic hyphenation As before with soft hyphens, automatic hyphenation could result too much shrinking, because of calculating with an extra non-existing space in the line. Also try to shrink the line only if a space likely will be available in it. |
(Note: commit 93ab1bc6be7b46226874810ec4fc3c61d5d0fc7c reverted the removal of unit test testOfz64109, caused by the second commit by accident. Reported by Miklós Vajna.)
Text portions (spans or runs) are associated to direct character formatting of the paragraphs, or simply resulted by editing the text without formatting difference.
The first implementation calculated only with the last text portion of the line, which didn’t cause problem, if there was only a single text portion, otherwise resulted different line breaks (less or missing shrinking). The first screenshots show, that the test file contained multiple bad line break resulted by the multiple text portions in the lines. These three bad line breaks were fixed by the first commit.
Selected text and the text cursor were in wrong position, over the exceeding, i.e. not shrank line instead of the visible shrank line. The second commit adjusted cursor movement and text selection to the shrank lines.
Extending text layout code to shrinking resulted an infinite loop, e.g. when a paragraph ended with a no-break space initiated the recalculation of the line with underflow. This was solved by disabling shrinking for underflow.
Removing spaces instead of adding them/replacing them allowed more precise comparison of MSO and LibreOffice, see the following data measured with the previous test file, which show up to 20% shrinking of spaces instead of up to 2% shrinking of the lines.
paragraph width | max spaces in line | space width |
|
|
|
347.53 pt | 104 | 3.34 pt |
|
|
|
|
|
|
|
|
|
| allowed extra character spacing in the line (pt) | extra spacing by shrinking | shrinking | ||
spaces | MS Word | Writer | difference (pt) | line | spaces |
6 | 5.98 | 1.95 | 4.03 | 1.2% | 20.1% |
5 | 8.08 | 4.65 | 3.43 | 1.0% | 20.5% |
4 | 10.18 | 7.45 | 2.73 | 0.8% | 20.4% |
3 | 12.23 | 10.25 | 1.98 | 0.6% | 19.8% |
2 | 14.33 | 13.05 | 1.28 | 0.4% | 19.2% |
1 | 16.43 | 15.75 | 0.68 | 0.2% | 20.3% |
|
|
|
|
|
|
Low precision (approximated recalculation of multiple text portions?) | |||||
9 | 11.15 | 4.15 | 7 | 2.0% | 23.3% |
12 | 9.63 | 1.35 | 8.28 | 2.4% | 20.6% |
21 | 16.83 | 5.85 | 10.98 | 3.2% | 15.6% |
The last commit changed the algorithm to this, resulting the same line break not only for the previous test file, but for the originally reported test document of tdf#119908: (Click on the image to show more details.)
Hyphenation was only a paragraph-level feature in LibreOffice, while the OpenDocument standard allows to disable hyphenation in character formatting, too. The related commits in LibreOffice code base, which solved the problem, allowing to disable words from hyphenation.
Commit | Description |
tdf#106733 xmloff: keep fo:hyphenate in character formatting In the case of character formatting, map fo:hyphenate to the unused CharNoHyphenation character property to keep it during ODF import/export instead of losing it completely. This is the first step to disable hyphenation for single words or text spans in paragraphs with automatic hyphenation. Note: using fo:hyphenate as character property is part of the ODF standard. Note: the old workaround to disable hyphenation, changing the language of the text to None had got some serious fallback: losing spell checking and losing language-dependent text layout (supported by both OpenType and Graphite font engines in LibreOffice). | |
tdf#106733 sw: implement CharNoHyphenation Implement CharNoHyphenation character property to disable automatic hyphenation of words in paragraphs with enabled hyphenation. Fix also fo:hyphenate mapping to CharNoHyphenation using automatic inversion of their boolean values defined by xmloff's XML_TYPE_NBOOL, as suggested by Michael Stahl. | |
tdf#106733 sw: fix bad downcast in SwTextNode::GetLang() Fix bad cast of SvxNoHyphenItem to SvxLanguageItem reported by <https://ci.libreoffice.org/job/lo_ubsan/3049/>. | |
tdf#106733 sw cui: add CharNoHyphenation checkbox On Position tab of Character formatting dialog window as a new checkbox "Exclude from hyphenation" (UX design by Heiko Tietze). With this, it's possible to disable hyphenation with direct character formatting (e.g. combined with Find All), or using character styles, and setting “Exclude from hyphenation” in them. This feature is conformant to the OpenDocument standard, and unlike the previous locale=None workaround, it keeps spell checking and locale dependent text layout. |
(Note: commit 1b83ebf42c535528b73baac2407b347f19070d07 disabled the unit test of tdf#159102 temporarily, because of lack of hyphenation on some test builds. Reported by Noel Grandin and Miklós Vajna.)
The following screenshot shows the new option in Character formatting dialog window:
This development adds new typographical features standardized by OpenDocument, XSL and CSS 4 to LibreOffice Writer to guarantee MSO interoperability and more: following typographical rules and web standards.
Not only justified text, but left-aligned etc. texts have lost their interoperability since MSO 2013, depending on hyphenation. New default page layout algorithms of MSO truncates the last hyphenated word (before MSO 2013) or line (MSO 2013 and newer), following the English typographical traditions and rules.
Also OpenDocument standard contains the same feature, fo:hyphenation-keep, but this wasn’t implemented before, also LibreOffice hasn’t kept this setting during import/export of an ODF document.
For more control on hyphenation, also better conformance with standard typographical rules, LibreOffice Writer implements “hyphenation across” values of XSL and CSS 4 which haven’t been covered by OpenDocument, yet (loext:hyphenation-keep-type).
As a new development, LibreOffice not only keeps the value of fo:hyphenation-keep, but the page layout moves the last hyphenated line of the page (or column) to the next page (or column), similar to MSO 2016 and newer.
Commit | Description |
tdf#132599 cui offapi sw xmloff: implement hyphenate-keep Both parts of a hyphenated word shall lie within a single page with ODF paragraph setting fo:hyphenation-keep="page". The implementation follows the default page layout of MSO 2016 and newer by shifting the bottom hyphenated line to the next page (and to the next column). Note: this is a MSO DOCX interoperability feature, used also in DTP software, XSL and CSS.
* Add checkbox/combobox to Text Flow in paragraph dialog * Store property in paragraph model (com::sun::star::style::ParagraphProperties::ParaHyphenationKeep) * Add ODF import/export * Add ODF unit tests
New constants of com::sun::star::text::ParagraphHyphenationKeepType, containing ODF AUTO and PAGE (borrowed from XSL), and for the planned extension ParaHyphenationKeepType of ParagraphProperties:
– COLUMN (standard XSL value, defined in https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)
– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last, equivalent of hyphenation-keep, defined in https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).
Note: the implementation truncates only a single hyphenated line, like MSO does: the pages can end in hyphenated lines (i.e. in the case of consecutive hyphenated lines), but less often, than before.
Clean-up hyphenation dialog by collecting "Don't hyphenate" options at the end of the hyphenation settings, and negating them (similar to MSO and DTP), adding also the new option "Hyphenate across column and page":
[x] Hyphenate words in CAPS [x] Hyphenate last word [x] Hyphenate across column and page
Note: ODF fo:hyphenation-keep has got only "auto" and "page" attributes, while XSL defines also "column". Because of the interoperability with MSO and DTP, fo:hyphenation-keep="page" is interpreted as XSL "column", avoiding hyphenation at the end of column, not only at the end of page. |
The following screenshot shows the new “Hyphenate across column and page” option on “Text Flow” pane of Paragraph formatting dialog window: the last hyphenated line “except that it has at-” was shifted to the next page. Also the clean-up of “Hyphenate words in CAPS/last word” options is visible.
Recent OpenDocument standard has not fully adopted values of the XSL attribute hyphenation-keep: value “column” is missing (https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep). Also according e.g. to New Hart Rules for English (OUP, 2005, “Line Endings”, p. 40, cited by R. Green in tdf#132599), it’s a typographical requirement to avoid hyphenation in the last line of a spread, i.e. a visible page pair. That is why CSS 4 defines “spread”, not only “page” and “column” to control hyphenation.
The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page” and “Spread” on Text Flow pane of the Paragraph settings:
Following screenshots show layout of the test documents in LibreOffice Writer 24.8.
LEFT: Hyphenation across everywhere (hyphenation-keep="auto"). RIGHT: No hyphenation across, resulting shifted hyphenated line (hyphenation-keep="page", which means the default loext:hyphenation-keep="column" for interoperability reasons).