LibreOffice/Collabora Online Typography

Line break interoperability and state-of-the-art ISO OpenDocument/web typography in open source office suites

Table of Contents

1 Summary

2 Example

3 Fixing line/page count interoperability for plain text

3.1 Result of the first analysis of the unknown algorithm

3.2 Fix line count & page count interoperability for plain text (2023-10-17)

Visual comparison

3.3 Fix paragraph layout interoperability: shrink exceeding paragraph lines (2023-11-16)

3.4 Fix handling text portions, cursor positions/selection and shrinking algorithm

Multiple text portions

Cursor positions

Freezing at underflow

Result of the second analysis of the shrinking algorithm

4 Exclude words from hyphenation

5 Don’t hyphenate across a column, page or spread

5.1 Adding hyphenation-keep

Developments

User interface

5.2 Support column, page and spread types, DOCX import

User interface

Don’t hyphenate across a page

Don’t hyphenate across a spread

Don’t hyphenate across a column

Hyphenate across a column, except in the last one

Developments (Writer core, DOCX filter and help content)

6 Upcoming developments

1Summary

LibreOffice, the open source office suite is the reference implementation of ISO OpenDocument (ODF) format. As part of LibreOffice Technology, Collabora Online is key for the digital sovereignty of open source online document editing. Metrically equivalent fonts cannot guarantee MS Word-interoperability any more because of the undocumented changes in MS Word line break algorithm after ODF and OOXML standardization. The planned project solves this fundamental problem, adding also the the following interoperability and CSS4 web typography features and fixes to LibreOffice line break algorithm and layout:

All the developments are libre and open source, and will be integrated with the main development branch of LibreOffice, and released with its next stable release.

2Example

Layout difference of MS Word (upper) and LibreOffice (bottom): line break in justified paragraphs supports shrinking of spaces between words since MS Word 2013. Only disabling the justification results in the same layout, moving the word “Antarctica” to the second line in MS Word, too. The bottom paragraphs contain hundreds of adjacent spaces shrinking only in MS Word.

 
 

3Fixing line/page count interoperability for plain text

3.1Result of the first analysis of the unknown algorithm

With 0.1 pt resolution, up to 2% shrinking was measured for plain justified lines (no direct character formatting, no hyphenation). The following picture shows, how the shrinking is implemented in MSO by using the available spaces (but the maximal shrinking is independent from the number of the spaces – at least in a line with enough spaces):

(Click on the image to show more details. Black text: MS Word, red text: LibreOffice Writer – the black text contains an extra character spacing after the text “3 Au” to give the same position after the last space).

3.2Fix line count & page count interoperability for plain text (2023-10-17)

Commit 7d08767b890e723cd502b1c61d250924f695eb98 “tdf#130088 tdf#119908 smart justify: fix DOCX line count + compat opt.” is the fix for the line count/page count differences and the initial fix for the new justification:

Visual comparison

A simple
lorem.docx
document (attached to
tdf#119908
) was created in MSO 2016 to test and show the improved line break, which solved the line/page count interoperability in DOCX import of LibreOffice Writer for plain text:
 

(MSO, Writer before the fix, Writer after the fix. Click on the image to show more details. Blue marks: correct line break positions in MSO and improved Writer, red marks: bad line breaks in Writer before the fix).

3.3Fix paragraph layout interoperability: shrink exceeding paragraph lines (2023-11-16)

Shrinking has been added by commit 17eaebee279772b6062ae3448012133897fc71bb “tdf#119908 sw smart justify: fix justification by shrinking” and commit c1803de8a093739d189be54b2d9bd5634e9e79ee “tdf#119908 sw smart justify: add unit test”, fixing the temporary exceeding lines, i.e. justifying them.

The following composite picture shows the previous (red), and the shrank (black) lines in Writer. With this commit, the result is the same, as in MS Word. (Click on the image to show more details.)

3.4Fix handling text portions, cursor positions/selection and shrinking algorithm

The following composite screenshots of MS Word (red) and Writer (black) show the fix of handling multiple text portions (middle), and the shrinking algorithm (right), which resulted the same line breaks, as in MS Word (test document: lorempage.docx of Bug 158333, generated PDFs: Word, Writer). (Click on the image to show more details.)

The related commits in LibreOffice code base:

Commit

Description

53de98b29548ded88e0a44c80256fc5e340d551e

tdf#158333 sw smart justify: fix multiple text portions

Multiple text portions, e.g. if some part of a line contains direct character formatting breaks DOCX interoperability of justified paragraphs.

20cbe88ce5610fd8ee302e5780a4c0821ddb3db4

tdf#119908 tdf#158419 sw smart justify: fix cursor position

Text cursor didn't follow the new word positions yet, because of unsigned casting of the negative shrinking value.

7059a1858ddb044c5f3f0c8e0386d3e1d9dd2b5f

tdf#119908 tdf#158436 sw smart justify: fix freezing with NBSP

Stop shrinking during underflow, because it resulted endless layout loop, e.g. when a very short word followed by a no-break space.

The problem reported by Miklós Vajna (Collabora Productivity).

36bfc86e27fa03ee16f87819549ab126c5a68cac

tdf#119908 tdf#158776 sw smart justify: shrink only spaces

For interoperability, only shrink spaces up to 20%, not the lines up to 2%.

8b393bba91111bd4f8988457f3a78b0306462bf2

tdf#159102 sw smart justify: fix automatic hyphenation

As before with soft hyphens, automatic hyphenation could result too much shrinking, because of calculating with an extra non-existing space in the line. Also try to shrink the line only if a space likely will be available in it.

(Note: commit 93ab1bc6be7b46226874810ec4fc3c61d5d0fc7c reverted the removal of unit test testOfz64109, caused by the second commit by accident. Reported by Miklós Vajna.)

Multiple text portions

Text portions (spans or runs) are associated to direct character formatting of the paragraphs, or simply resulted by editing the text without formatting difference.

The first implementation calculated only with the last text portion of the line, which didn’t cause problem, if there was only a single text portion, otherwise resulted different line breaks (less or missing shrinking). The first screenshots show, that the test file contained multiple bad line break resulted by the multiple text portions in the lines. These three bad line breaks were fixed by the first commit.

Cursor positions

Selected text and the text cursor were in wrong position, over the exceeding, i.e. not shrank line instead of the visible shrank line. The second commit adjusted cursor movement and text selection to the shrank lines.

Freezing at underflow

Extending text layout code to shrinking resulted an infinite loop, e.g. when a paragraph ended with a no-break space initiated the recalculation of the line with underflow. This was solved by disabling shrinking for underflow.

Result of the second analysis of the shrinking algorithm

Removing spaces instead of adding them/replacing them allowed more precise comparison of MSO and LibreOffice, see the following data measured with the previous test file, which show up to 20% shrinking of spaces instead of up to 2% shrinking of the lines.

paragraph width

max spaces in line

space width

 

 

 

347.53 pt

104

3.34 pt

 

 

 

 

 

 

 

 

 

 

allowed extra character spacing in the line (pt)

extra spacing by shrinking

shrinking

spaces

MS Word

Writer

difference (pt)

line

spaces

6

5.98

1.95

4.03

1.2%

20.1%

5

8.08

4.65

3.43

1.0%

20.5%

4

10.18

7.45

2.73

0.8%

20.4%

3

12.23

10.25

1.98

0.6%

19.8%

2

14.33

13.05

1.28

0.4%

19.2%

1

16.43

15.75

0.68

0.2%

20.3%

 

 

 

 

 

 

Low precision (approximated recalculation of multiple text portions?)

9

11.15

4.15

7

2.0%

23.3%

12

9.63

1.35

8.28

2.4%

20.6%

21

16.83

5.85

10.98

3.2%

15.6%

The last commit changed the algorithm to this, resulting the same line break not only for the previous test file, but for the originally reported test document of tdf#119908: (Click on the image to show more details.)

4Exclude words from hyphenation

Hyphenation was only a paragraph-level feature in LibreOffice, while the OpenDocument standard allows to disable hyphenation in character formatting, too. The related commits in LibreOffice code base, which solved the problem, allowing to disable words from hyphenation.

Commit

Description

73bd04a71e741788a2f2f3b26cc46ddb6a361372

tdf#106733 xmloff: keep fo:hyphenate in character formatting

In the case of character formatting, map fo:hyphenate to the unused CharNoHyphenation character property to keep it during ODF import/export instead of losing it completely. This is the first step to disable hyphenation for single words or text spans in paragraphs with automatic hyphenation. Note: using fo:hyphenate as character property is part of the ODF standard.

Note: the old workaround to disable hyphenation, changing the language of the text to None had got some serious fallback: losing spell checking and losing language-dependent text layout (supported by both OpenType and Graphite font engines in LibreOffice).

b5e275f47a54bd7fee39dad516a433fde5be872d

tdf#106733 sw: implement CharNoHyphenation

Implement CharNoHyphenation character property to disable automatic hyphenation of words in paragraphs with enabled hyphenation.

Fix also fo:hyphenate mapping to CharNoHyphenation using automatic inversion of their boolean values defined by xmloff's XML_TYPE_NBOOL, as suggested by Michael Stahl.

9193e61d3e7b850b3715c848c09434e24855340b

tdf#106733 sw: fix bad downcast in SwTextNode::GetLang()

Fix bad cast of SvxNoHyphenItem to SvxLanguageItem reported by <https://ci.libreoffice.org/job/lo_ubsan/3049/>.

03c5a31a0f374a90fbc821718c14dc5f8a385adf

tdf#106733 sw cui: add CharNoHyphenation checkbox

On Position tab of Character formatting dialog window as a new checkbox "Exclude from hyphenation" (UX design by Heiko Tietze).

With this, it's possible to disable hyphenation with direct character formatting (e.g. combined with Find All), or using character styles, and settingExclude from hyphenation in them. This feature is conformant to the OpenDocument standard, and unlike the previous locale=None workaround, it keeps spell checking and locale dependent text layout.

(Note: commit 1b83ebf42c535528b73baac2407b347f19070d07 disabled the unit test of tdf#159102 temporarily, because of lack of hyphenation on some test builds. Reported by Noel Grandin and Miklós Vajna.)

The following screenshot shows the new option in Character formatting dialog window:

5Don’t hyphenate across a column, page or spread

This development adds new typographical features standardized by OpenDocument, XSL and CSS 4 to LibreOffice Writer to guarantee MSO interoperability and more: following typographical rules and web standards.

Not only justified text, but left-aligned etc. texts have lost their interoperability since MSO 2013, depending on hyphenation. New default page layout algorithms of MSO truncates the last hyphenated word (before MSO 2013) or line (MSO 2013 and newer), following the English typographical traditions and rules.

Also OpenDocument standard contains the same feature, fo:hyphenation-keep, but this wasn’t implemented before, also LibreOffice hasn’t kept this setting during import/export of an ODF document.

For more control on hyphenation, also better conformance with standard typographical rules, LibreOffice Writer implements “hyphenation across” values of XSL and CSS 4 which haven’t been covered by OpenDocument, yet (loext:hyphenation-keep-type).

5.1Adding hyphenation-keep

As a new development, LibreOffice not only keeps the value of fo:hyphenation-keep, but the page layout moves the last hyphenated line of the page (or column) to the next page (or column), similar to MSO 2016 and newer.

Developments

Commit

Description

9574a62add8e4901405e12117e75c86c2d2c2f21

tdf#132599 cui offapi sw xmloff: implement hyphenate-keep

Both parts of a hyphenated word shall lie within a single

page with ODF paragraph setting fo:hyphenation-keep="page".

The implementation follows the default page layout of

MSO 2016 and newer by shifting the bottom hyphenated line

to the next page (and to the next column).

Note: this is a MSO DOCX interoperability feature, used

also in DTP software, XSL and CSS.

 

* Add checkbox/combobox to Text Flow in paragraph dialog

* Store property in paragraph model (com::sun::star::style::ParagraphProperties::ParaHyphenationKeep)

* Add ODF import/export

* Add ODF unit tests

 

New constants of com::sun::star::text::ParagraphHyphenationKeepType,

containing ODF AUTO and PAGE (borrowed from XSL), and for the

planned extension ParaHyphenationKeepType of ParagraphProperties:

 

– COLUMN (standard XSL value, defined in https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)

 

– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last,

  equivalent of hyphenation-keep, defined in

  https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).

 

Note: the implementation truncates only a single hyphenated

line, like MSO does: the pages can end in hyphenated

lines (i.e. in the case of consecutive hyphenated lines),

but less often, than before.

 

Clean-up hyphenation dialog by collecting "Don't hyphenate"

options at the end of the hyphenation settings, and negating them

(similar to MSO and DTP), adding also the new option

"Hyphenate across column and page":

 

[x] Hyphenate words in CAPS

[x] Hyphenate last word

[x] Hyphenate across column and page

 

Note: ODF fo:hyphenation-keep has got only "auto" and

"page" attributes, while XSL defines also "column".

Because of the interoperability with MSO and DTP,

fo:hyphenation-keep="page" is interpreted as

XSL "column", avoiding hyphenation at the end

of column, not only at the end of page.

User interface

The following screenshot shows the new “Hyphenate across column and page” option on “Text Flow” pane of Paragraph formatting dialog window: the last hyphenated line “except that it has at-” was shifted to the next page. Also the clean-up of “Hyphenate words in CAPS/last word” options is visible.

5.2Support column, page and spread types, DOCX import

Recent OpenDocument standard has not fully adopted values of the XSL attribute hyphenation-keep: value “column” is missing (https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep). Also according e.g. to New Hart Rules for English (OUP, 2005, Line Endings”, p. 40, cited by R. Green in tdf#132599), it’s a typographical requirement to avoid hyphenation in the last line of a spread, i.e. a visible page pair. That is why CSS 4 defines “spread”, not only “page” and “column” to control hyphenation.

User interface

The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page” and “Spread” on Text Flow pane of the Paragraph settings:

 

Following screenshots show layout of the test documents in LibreOffice Writer 24.8.

Don’t hyphenate across a page

 

LEFT: Hyphenation across everywhere (hyphenation-keep="auto"). RIGHT: No hyphenation across, resulting shifted hyphenated line (hyphenation-keep="page", which means the default loext:hyphenation-keep="column" for interoperability reasons).

Don’t hyphenate across a spread

 

LEFT: Hyphenation across column and page, but not spread (hyphenation-keep="page", loext:hyphenation-keep-type="spread"). Shifted hyphenated line on the first (right-hand) page. RIGHT: same settings, but inserting a page break at the start of the document resulted missing shifting, because the bottom hyphenated line is on the second (left-hand) page.

Don’t hyphenate across a column

 

LEFT: No hyphenation across (hyphenation-keep="auto"). Shifted hyphenated line in the first column of the multi-column page. RIGHT: Hyphenation across a column. No shifted line in the first column (hyphenation-keep=”page", loext:hyphenation-keep-type=”page").

Hyphenate across a column, except in the last one

 

LEFT: Hyphenation across a column. No shifted line in the first column (hyphenation-keep=”page", loext:hyphenation-keep-type=”page"). RIGHT: same settings, but the last hyphenated line shifted in the last column, because that line is the last line of the page, too.

Developments (Writer core, DOCX filter and help content)

Details of the core and DOCX filter developments, also extending LibreOffice help with the new paragraph settings:

Commit

Description

6e8819f29b6051a0e551d77512830539913ec277

tdf#132599 cui offapi sw xmloff: add hyphenation-keep-type

 

Support XSL attribute "column" and CSS 4 attribute "spread",

stored in loext:hyphenation-keep-type, to give better control

over hyphenation-keep. E.g. spread: both parts of a hyphenated

word shall lie within a single spread, i.e. when the next page

is not visible at the same time (e.g. the next page is not a

right page of a book).

 

– css::style::ParaHyphenationKeep is a boolean property now,

  importing hyphenation-keep = "page" as true.

 

– type of ParaHyphenationKeep, including the new non-ODF types

  is stored in the new ParagraphProperties::ParaHyphenationKeepType.

 

– default value of ParaHyphenationKeepType is COLUMN for

  interoperability.

 

– Add checkboxes to Text Flow -> Hyphenation Across in

  paragraph dialog:

 

  * Column (previously: Hyphenate across column and page)

  * Page

  * Spread

 

  – enabling/disabling them follows XSL/CSS 4/loext, i.e.

    possible combinations:

 

  * No Hyphenation across

    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "column")

 

  * Hyphenation across [x] Column

    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "page")

 

  * Hyphenation across [x] Column [x] Page

    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "spread")

 

  * Hyphenation across [x] Column [x] Page [x] Spread

    (hyphenation-keep = "auto")

 

– Add ODF import/export

 

– Update DOCX import

 

– Add ODF unit tests

 

Note: recent implementation depends on widow settings: disabling widow

handling allows hyphenation across columns and pages not only in table

cells.

 

Note: RTF import-only, but not used bPageEnd has been renamed to bKeep.

Depending on the RTF test results, likely it will need to disable

the layout change, e.g. GetKeepType()=ParagraphHyphenationKeepType::AUTO,

if PageEnd uses obsolete hyphenation rule, i.e. shifting only the

hyphenated word to the next page, not the full line.

 

More information:

 

– COLUMN (standard XSL value, defined in

  https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)

 

– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last,

  equivalent of hyphenation-keep, defined in

  https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).

 

c8ee0e8f581b8a6e41b1a6b8aa4d40b442c1d463

tdf160518 DOCX: import hyphenation-keep to fix layout

 

To fix layout interoperability, import DOCX compatSettings

allowHyphenationAtTrackBottom and useWord2013TrackBottomHyphenation

as hyphenation-keep setting "COLUMN", shifting last hyphenated

lines of pages and columns, like MSO does.

58350a811a8001f72b13f6ca3def5f32ea904e72

tdf#132599 add "Hyphenation across" options

Document new options of LO 24.8 to control hyphenation

in last line of a column, page or spread.

6Upcoming developments

Add DOCX export filter for hyphenation-keep;

– implement XSL/CSS 4 value "always";

– add frame and table support/interoperability.

László Németh

2024-04-12

 

This project was funded through the NGI0 Entrust Fund, a fund established by NLnet Foundation with financial support from the European Commission's Next Generation Internet programme. More information: https://nlnet.nl/project/LO-Typography/