LibreOffice/Collabora Online Typography

Line break interoperability and state-of-the-art ISO OpenDocument/web typography in open source office suites

Table of Contents

1 Summary

2 Example

3 Fixing line/page count interoperability for plain text

3.1 Result of the first analysis of the unknown algorithm

Fix line count & page count interoperability for plain text (2023-10-17)

Visual comparison

3.2 Fix paragraph layout interoperability: shrink exceeding paragraph lines (2023-11-16)

3.3 Fix handling text portions, cursor positions/selection and shrinking algorithm

Multiple text portions

Cursor positions

Freezing at underflow

Result of the second analysis of the shrinking algorithm

4 Exclude words from hyphenation

5 Don’t hyphenate across a column, page or spread

5.1 Adding hyphenation-keep

Developments

User interface

5.2 Support column, page and spread types, DOCX import

User interface

Don’t hyphenate across a page

Don’t hyphenate across a spread

Don’t hyphenate across a column

Hyphenate across a column, except in the last one

Developments (Writer core, DOCX filter and help content)

5.3 Support hyphenation-keep in linked frames, in tables and last full line of paragraphs, DOCX export

Improved interoperability in Writer’s DOCX export

DOCX export of hyphenation enabled in “Text body” style

Last full line of paragraphs

User interface

Tables

Linked frames on the same pages

Linked frames not on the same spread

Linked frames on the same spread

Developments (Writer core, DOCX filter and help content)

Manual tests

Fixed DOCX export

Disable hyphenation of last full line of paragraphs

Tables

Linked frames

Linked frames only on following right pages

6 No Break context menu and visualization

6.1 Developments

6.2 Manual testing

7 DOCX interoperability fixes

7.1 Support of maximum consecutive hyphenated lines

7.2 Fix overshrank lines in smart justify

7.3 Default hyphenation zone

7.4 Analysis of the test document of Bug 149421 (hyphenation zone)

7.5 Developments

7.6 Manual testing

Maximum consecutive hyphenated lines

Fix overshrank lines in smart justify

Export zero hyphenation zone of new documents

Import default OOXML hyphenation zone

8 Sidebar hyphenation controls

8.1 Paragraph sidebar panel

8.2 Developments

8.3 Manual testing

8.4 Possible future developments

9 Inline heading

9.1 Developments

9.2 Manual testing

10 New DOCX interoperability fixes for space shrinking

10.1 Manual testing

11 Inline heading: DOCX export

11.1 Developments

11.2 Manual testing

12 Inline heading: UX support

Usage

12.2 Developments

12.3 Manual testing

12.4 Planned developments

13 New DOCX interoperability fixes for space shrinking

13.1 Manual testing

14 Inline heading UX, (X)HTML, PDF & Navigator support

14.1 Manual testing

DOCX export of the newly created inline headings

PDF bookmark export

Export XHTML

Save As HTML Document (Writer)

Navigator Move Up/Down support

1Summary

LibreOffice, the open source office suite is the reference implementation of ISO OpenDocument (ODF) format. As part of LibreOffice Technology, Collabora Online is key for the digital sovereignty of open source online document editing. Metrically equivalent fonts cannot guarantee MS Word-interoperability any more because of the undocumented changes in MS Word line break algorithm after ODF and OOXML standardization. The planned project solves this fundamental problem, adding also the the following interoperability and CSS4 web typography features and fixes to LibreOffice line break algorithm and layout:

All the developments are libre and open source, and will be integrated with the main development branch of LibreOffice, and released with its next stable release.

2Example

Layout difference of MS Word (upper) and LibreOffice (bottom): line break in justified paragraphs supports shrinking of spaces between words since MS Word 2013. Only disabling the justification results in the same layout, moving the word “Antarctica” to the second line in MS Word, too. The bottom paragraphs contain hundreds of adjacent spaces shrinking only in MS Word.

 

MS Word: in the first justified paragraph, the first shrunk line contains “Antarctica”.

 

LibreOffice Writer: lost layout in the first (and bottom) justified paragraphs.

3Fixing line/page count interoperability for plain text

3.1Result of the first analysis of the unknown algorithm

With 0.1 pt resolution, up to 2% shrinking was measured for plain justified lines (no direct character formatting, no hyphenation). The following picture shows, how the shrinking is implemented in MSO by using the available spaces (but the maximal shrinking is independent from the number of the spaces – at least in a line with enough spaces):

Black text: MS Word, red text: LibreOffice Writer – the black text contains an extra character spacing after the text “3 Au” to give the same position after the last space. (Click on the image to show more details.)

Fix line count & page count interoperability for plain text (2023-10-17)

Commit 7d08767b890e723cd502b1c61d250924f695eb98 “tdf#130088 tdf#119908 smart justify: fix DOCX line count + compat opt.” is the fix for the line count/page count differences and the initial fix for the new justification:

Visual comparison

A simple lorem.docx document (attached to tdf#119908) was created in MSO 2016 to test and show the improved line break, which solved the line/page count interoperability in DOCX import of LibreOffice Writer for plain text:

MSO, Writer before the fix, Writer after the fix. Blue marks: correct line break positions in MSO and improved Writer, red marks: bad line breaks in Writer before the fix. (Click on the image to show more details.)

3.2Fix paragraph layout interoperability: shrink exceeding paragraph lines (2023-11-16)

Shrinking has been added by commit 17eaebee279772b6062ae3448012133897fc71bb “tdf#119908 sw smart justify: fix justification by shrinking” and commit c1803de8a093739d189be54b2d9bd5634e9e79ee “tdf#119908 sw smart justify: add unit test”, fixing the temporary exceeding lines, i.e. justifying them.

The following composite picture shows the previous (red), and the shrank (black) lines in Writer. With this commit, the result is the same, as in MS Word. (Click on the image to show more details.)

3.3Fix handling text portions, cursor positions/selection and shrinking algorithm

The following composite screenshots of MS Word (red) and Writer (black) show the fix of handling multiple text portions (middle), and the shrinking algorithm (right), which resulted the same line breaks, as in MS Word (test document: lorempage.docx of Bug 158333, generated PDFs: Word, Writer). (Click on the image to show more details.)

The related commits in LibreOffice code base:

Commit

Description

53de98b29548ded88e0a44c80256fc5e340d551e

tdf#158333 sw smart justify: fix multiple text portions

Multiple text portions, e.g. if some part of a line contains direct character formatting breaks DOCX interoperability of justified paragraphs.

20cbe88ce5610fd8ee302e5780a4c0821ddb3db4

tdf#119908 tdf#158419 sw smart justify: fix cursor position

Text cursor didn't follow the new word positions yet, because of unsigned casting of the negative shrinking value.

7059a1858ddb044c5f3f0c8e0386d3e1d9dd2b5f

tdf#119908 tdf#158436 sw smart justify: fix freezing with NBSP

Stop shrinking during underflow, because it resulted endless layout loop, e.g. when a very short word followed by a no-break space.

The problem reported by Miklós Vajna (Collabora Productivity).

36bfc86e27fa03ee16f87819549ab126c5a68cac

tdf#119908 tdf#158776 sw smart justify: shrink only spaces

For interoperability, only shrink spaces up to 20%, not the lines up to 2%.

8b393bba91111bd4f8988457f3a78b0306462bf2

tdf#159102 sw smart justify: fix automatic hyphenation

As before with soft hyphens, automatic hyphenation could result too much shrinking, because of calculating with an extra non-existing space in the line. Also try to shrink the line only if a space likely will be available in it.

(Note: commit 93ab1bc6be7b46226874810ec4fc3c61d5d0fc7c reverted the removal of unit test testOfz64109, caused by the second commit by accident. Reported by Miklós Vajna.)

Multiple text portions

Text portions (spans or runs) are associated to direct character formatting of the paragraphs, or simply resulted by editing the text without formatting difference.

The first implementation calculated only with the last text portion of the line, which didn’t cause problem, if there was only a single text portion, otherwise resulted different line breaks (less or missing shrinking). The first screenshots show, that the test file contained multiple bad line break resulted by the multiple text portions in the lines. These three bad line breaks were fixed by the first commit.

Cursor positions

Selected text and the text cursor were in wrong position, over the exceeding, i.e. not shrank line instead of the visible shrank line. The second commit adjusted cursor movement and text selection to the shrank lines.

Freezing at underflow

Extending text layout code to shrinking resulted an infinite loop, e.g. when a paragraph ended with a no-break space initiated the recalculation of the line with underflow. This was solved by disabling shrinking for underflow.

Result of the second analysis of the shrinking algorithm

Removing spaces instead of adding them/replacing them allowed more precise comparison of MSO and LibreOffice, see the following data measured with the previous test file, which show up to 20% shrinking of spaces instead of up to 2% shrinking of the lines.

paragraph width

max spaces in line

space width

 

 

 

347.53 pt

104

3.34 pt

 

 

 

 

 

 

 

 

 

 

allowed extra character spacing in the line (pt)

extra spacing by shrinking

shrinking

spaces

MS Word

Writer

difference (pt)

line

spaces

6

5.98

1.95

4.03

1.2%

20.1%

5

8.08

4.65

3.43

1.0%

20.5%

4

10.18

7.45

2.73

0.8%

20.4%

3

12.23

10.25

1.98

0.6%

19.8%

2

14.33

13.05

1.28

0.4%

19.2%

1

16.43

15.75

0.68

0.2%

20.3%

 

 

 

 

 

 

Low precision (approximated recalculation of multiple text portions?)

9

11.15

4.15

7

2.0%

23.3%

12

9.63

1.35

8.28

2.4%

20.6%

21

16.83

5.85

10.98

3.2%

15.6%

The last commit changed the algorithm to this, resulting the same line break not only for the previous test file, but for the originally reported test document of tdf#119908: (Click on the image to show more details.)

4Exclude words from hyphenation

Hyphenation was only a paragraph-level feature in LibreOffice, while the OpenDocument standard allows to disable hyphenation in character formatting, too. The related commits in LibreOffice code base, which solved the problem, allowing to disable words from hyphenation.

Commit

Description

73bd04a71e741788a2f2f3b26cc46ddb6a361372

tdf#106733 xmloff: keep fo:hyphenate in character formatting

In the case of character formatting, map fo:hyphenate to the unused CharNoHyphenation character property to keep it during ODF import/export instead of losing it completely. This is the first step to disable hyphenation for single words or text spans in paragraphs with automatic hyphenation. Note: using fo:hyphenate as character property is part of the ODF standard.

Note: the old workaround to disable hyphenation, changing the language of the text to None had got some serious fallback: losing spell checking and losing language-dependent text layout (supported by both OpenType and Graphite font engines in LibreOffice).

b5e275f47a54bd7fee39dad516a433fde5be872d

tdf#106733 sw: implement CharNoHyphenation

Implement CharNoHyphenation character property to disable automatic hyphenation of words in paragraphs with enabled hyphenation.

Fix also fo:hyphenate mapping to CharNoHyphenation using automatic inversion of their boolean values defined by xmloff's XML_TYPE_NBOOL, as suggested by Michael Stahl.

9193e61d3e7b850b3715c848c09434e24855340b

tdf#106733 sw: fix bad downcast in SwTextNode::GetLang()

Fix bad cast of SvxNoHyphenItem to SvxLanguageItem reported by <https://ci.libreoffice.org/job/lo_ubsan/3049/>.

03c5a31a0f374a90fbc821718c14dc5f8a385adf

tdf#106733 sw cui: add CharNoHyphenation checkbox

On Position tab of Character formatting dialog window as a new checkbox "Exclude from hyphenation" (UX design by Heiko Tietze).

With this, it's possible to disable hyphenation with direct character formatting (e.g. combined with Find All), or using character styles, and settingExclude from hyphenation in them. This feature is conformant to the OpenDocument standard, and unlike the previous locale=None workaround, it keeps spell checking and locale dependent text layout.

(Note: commit 1b83ebf42c535528b73baac2407b347f19070d07 disabled the unit test of tdf#159102 temporarily, because of lack of hyphenation on some test builds. Reported by Noel Grandin and Miklós Vajna.)

The following screenshot shows the new option in Character formatting dialog window:

5Don’t hyphenate across a column, page or spread

This development adds new typographical features standardized by OpenDocument, XSL and CSS 4 to LibreOffice Writer to guarantee MSO interoperability and more: following typographical rules and web standards.

Not only justified text, but left-aligned etc. texts have lost their interoperability since MSO 2013, depending on hyphenation. New default page layout algorithms of MSO truncates the last hyphenated word (before MSO 2013) or line (MSO 2013 and newer), following the English typographical traditions and rules.

Also OpenDocument standard contains the same feature, fo:hyphenation-keep, but this wasn’t implemented before, also LibreOffice hasn’t kept this setting during import/export of an ODF document.

For more control on hyphenation, also better conformance with standard typographical rules, LibreOffice Writer implements “hyphenation across” values of XSL and CSS 4 which haven’t been covered by OpenDocument, yet (loext:hyphenation-keep-type).

5.1Adding hyphenation-keep

As a new development, LibreOffice not only keeps the value of fo:hyphenation-keep, but the page layout moves the last hyphenated line of the page (or column) to the next page (or column), similar to MSO 2016 and newer.

Developments

Commit

Description

9574a62add8e4901405e12117e75c86c2d2c2f21

tdf#132599 cui offapi sw xmloff: implement hyphenate-keep

Both parts of a hyphenated word shall lie within a single

page with ODF paragraph setting fo:hyphenation-keep="page".

The implementation follows the default page layout of

MSO 2016 and newer by shifting the bottom hyphenated line

to the next page (and to the next column).

Note: this is a MSO DOCX interoperability feature, used

also in DTP software, XSL and CSS.

 

* Add checkbox/combobox to Text Flow in paragraph dialog

* Store property in paragraph model (com::sun::star::style::ParagraphProperties::ParaHyphenationKeep)

* Add ODF import/export

* Add ODF unit tests

 

New constants of com::sun::star::text::ParagraphHyphenationKeepType,

containing ODF AUTO and PAGE (borrowed from XSL), and for the

planned extension ParaHyphenationKeepType of ParagraphProperties:

 

– COLUMN (standard XSL value, defined in https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)

 

– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last,

  equivalent of hyphenation-keep, defined in

  https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).

 

Note: the implementation truncates only a single hyphenated

line, like MSO does: the pages can end in hyphenated

lines (i.e. in the case of consecutive hyphenated lines),

but less often, than before.

 

Clean-up hyphenation dialog by collecting "Don't hyphenate"

options at the end of the hyphenation settings, and negating them

(similar to MSO and DTP), adding also the new option

"Hyphenate across column and page":

 

[x] Hyphenate words in CAPS

[x] Hyphenate last word

[x] Hyphenate across column and page

 

Note: ODF fo:hyphenation-keep has got only "auto" and

"page" attributes, while XSL defines also "column".

Because of the interoperability with MSO and DTP,

fo:hyphenation-keep="page" is interpreted as

XSL "column", avoiding hyphenation at the end

of column, not only at the end of page.

User interface

The following screenshot shows the new “Hyphenate across column and page” option on “Text Flow” pane of Paragraph formatting dialog window: the last hyphenated line “except that it has at-” was shifted to the next page. Also the clean-up of “Hyphenate words in CAPS/last word” options is visible.

5.2Support column, page and spread types, DOCX import

Recent OpenDocument standard has not fully adopted values of the XSL attribute hyphenation-keep: value “column” is missing (https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep). Also according e.g. to New Hart Rules for English (OUP, 2005, Line Endings”, p. 40, cited by R. Green in tdf#132599), it’s a typographical requirement to avoid hyphenation in the last line of a spread, i.e. a visible page pair. That is why CSS 4 defines “spread”, not only “page” and “column” to control hyphenation.

User interface

The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page” and “Spread” on Text Flow pane of the Paragraph settings:

 

Following screenshots show layout of the test documents in LibreOffice Writer 24.8.

Don’t hyphenate across a page

 

LEFT: Hyphenation across everywhere (hyphenation-keep="auto"). RIGHT: No hyphenation across, resulting shifted hyphenated line (hyphenation-keep="page", which means the default loext:hyphenation-keep="column" for interoperability reasons).

Don’t hyphenate across a spread

 

LEFT: Hyphenation across column and page, but not spread (hyphenation-keep="page", loext:hyphenation-keep-type="spread"). Shifted hyphenated line on the first (right-hand) page. RIGHT: same settings, but inserting a page break at the start of the document resulted missing shifting, because the bottom hyphenated line is on the second (left-hand) page.

Don’t hyphenate across a column

 

LEFT: No hyphenation across (hyphenation-keep="auto"). Shifted hyphenated line in the first column of the multi-column page. RIGHT: Hyphenation across a column. No shifted line in the first column (hyphenation-keep=”page", loext:hyphenation-keep-type=”page").

Hyphenate across a column, except in the last one

 

LEFT: Hyphenation across a column. No shifted line in the first column (hyphenation-keep=”page", loext:hyphenation-keep-type=”page"). RIGHT: same settings, but the last hyphenated line shifted in the last column, because that line is the last line of the page, too.

Developments (Writer core, DOCX filter and help content)

Details of the core and DOCX filter developments, also extending LibreOffice help with the new paragraph settings:

Commit

Description

6e8819f29b6051a0e551d77512830539913ec277

tdf#132599 cui offapi sw xmloff: add hyphenation-keep-type

 

Support XSL attribute "column" and CSS 4 attribute "spread",

stored in loext:hyphenation-keep-type, to give better control

over hyphenation-keep. E.g. spread: both parts of a hyphenated

word shall lie within a single spread, i.e. when the next page

is not visible at the same time (e.g. the next page is not a

right page of a book).

 

– css::style::ParaHyphenationKeep is a boolean property now,

  importing hyphenation-keep = "page" as true.

 

– type of ParaHyphenationKeep, including the new non-ODF types

  is stored in the new ParagraphProperties::ParaHyphenationKeepType.

 

– default value of ParaHyphenationKeepType is COLUMN for

  interoperability.

 

– Add checkboxes to Text Flow -> Hyphenation Across in

  paragraph dialog:

 

  * Column (previously: Hyphenate across column and page)

  * Page

  * Spread

 

  – enabling/disabling them follows XSL/CSS 4/loext, i.e.

    possible combinations:

 

  * No Hyphenation across

    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "column")

 

  * Hyphenation across [x] Column

    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "page")

 

  * Hyphenation across [x] Column [x] Page

    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "spread")

 

  * Hyphenation across [x] Column [x] Page [x] Spread

    (hyphenation-keep = "auto")

 

– Add ODF import/export

 

– Update DOCX import

 

– Add ODF unit tests

 

Note: recent implementation depends on widow settings: disabling widow

handling allows hyphenation across columns and pages not only in table

cells.

 

Note: RTF import-only, but not used bPageEnd has been renamed to bKeep.

Depending on the RTF test results, likely it will need to disable

the layout change, e.g. GetKeepType()=ParagraphHyphenationKeepType::AUTO,

if PageEnd uses obsolete hyphenation rule, i.e. shifting only the

hyphenated word to the next page, not the full line.

 

More information:

 

– COLUMN (standard XSL value, defined in

  https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)

 

– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last,

  equivalent of hyphenation-keep, defined in

  https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).

 

c8ee0e8f581b8a6e41b1a6b8aa4d40b442c1d463

tdf160518 DOCX: import hyphenation-keep to fix layout

 

To fix layout interoperability, import DOCX compatSettings

allowHyphenationAtTrackBottom and useWord2013TrackBottomHyphenation

as hyphenation-keep setting "COLUMN", shifting last hyphenated

lines of pages and columns, like MSO does.

58350a811a8001f72b13f6ca3def5f32ea904e72

tdf#132599 add "Hyphenation across" options

Document new options of LO 24.8 to control hyphenation

in last line of a column, page or spread.

5.3Support hyphenation-keep in linked frames, in tables and last full line of paragraphs, DOCX export

The code behind Hyphenation across” has been generalized for all possible page changes, including linked frames, also columns in tables and linked frames.

Improved interoperability in Writer’s DOCX export

The DOCX export broke the layout of the documents created in Writer, for example, resulted more pages in MS Word in the case of hyphenated paragraphs. This problem was fixed by adding the missing allowHyphenationAtTrackBottom DOCX compatibility setting.

The following composite screenshots show the DOCX export in MSO (red text), which was 3-page before the fix (top row). After the fix, the result is 2-page in MSO (bottom row), as in Writer (black text). Test document:

 

DOCX export of hyphenation enabled in “Text body” style

Hyphenation was lost, if it was enabled only in “Text body” instead of the default paragraph style. Now Writer exports hyphenation in this case, too, which is more common for documents created in Writer.

Last full line of paragraphs

CSS 4 „always” was implemented as Hyphenate across → Last full line of paragraph. The hyphenated word of the last full line of the paragraph moves to the last line (if there is enough place for it). This results in longer last lines, and removed hyphenation in the bottom right-hand corner of the paragraph.

 

LEFT: missing recognition of hyphenate-keep-type="always" in the last paragraph. RIGHT: correct layout: hyphenated word of the last full paragraph was shifted to the last paragraph line.

User interface

The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page”, “Spread” and the newest Last full line of paragraph on Text Flow pane of the Paragraph settings. The test document and its screenshot show that the hyphenated line was shifted to page 2, according to the Hyphenation across → Page setting:

 

Tables

Now Hyphenation across” works in tables, too, removing the widow setting dependency of the previous implementation:

 

 

LEFT: missing shifting between the split table cell, with hyphenation-keep-type="spread". RIGHT: correct layout.

Linked frames on the same pages

Linked frames are like columns on the same pages, now with correct layout:

 

LEFT: bad shifting between the linked frames on a single page, with hyphenation-keep-type="page". RIGHT: correct layout.

Linked frames not on the same spread

With hyphenation-keep="spread", blank left pages weren’t handled correctly, also linked frames anchored only on different right pages.

 

LEFT: missing shifting between linked frames on right pages, with hyphenation-keep-type="spread". RIGHT: correct layout.

Linked frames on the same spread

Spread is still recognized with linked frames on left and right pages:

Middle frame on the second (left) page ends in a hyphenated line, according to hyphenation-keep="spread".
 
 

Developments (Writer core, DOCX filter and help content)

Details of the core and DOCX filter developments, also extending LibreOffice help with the new paragraph settings:

Commit

Description

c8a99cb8dce54de506ba66d1cc0818b9b5f7858b

tdf#132599 sw schema xmloff: add hyphenation-keep-type='always'

Add new hyphenation option to limit hyphenation of the last full line of the hyphenated paragraph. Move also loext:hyphenation-keep-type to paragraph-properties, following the associated hyphenation-keep. Note: value "always" is defined by CSS 4 hyphenate-limit-last,

see https://www.w3.org/TR/css-text-4/#hyphenate-line-limits.

d4304cd0a4fedd0117fea3625dff1fca2945a0e6

tdf132599 sw: fix hyphenation-keep for linked frames, also for spreads

Linked text frames are hyphenated as columns on the same page,

i.e. do not shift the hyphenated line, if hyphenation-keep-type="page" or "spread". For "spread", check also that the hyphenated line is on the previous left page, because checking only right page wasn't enough for linked text frames and blank left pages.

a4970f4eeb94b8c405c5e3ec094d47061253efac

tdf#132599 sw: fix hyphenation-keep for tables and no widow

Now hyphenation-keep works without widow settings, too, e.g. in tables (where despite the existing widow settings, widow handling is always disabled).

9668c9b8fe1d4afba335ab1f9d3309ad91bd56da

tdf#132599 sw: fix test of "fix hyphenation-keep for tables and no widow"

The problem was reported by Miklós Vajna.

016d61f529f9d9ec2520fb7a808da41cf17d7295

tdf#132599 sw: fix unit tests for hyphenation-keep with frames

Fix en_US language of the test documents to be consistent with the hyphenator condition in the related unit tests of commit d4304cd0a4fedd0117fea3625dff1fca2945a0e6 "tdf132599 sw: fix hyphenation-keep for linked frames, also for spreads".

The problem was reported by René Engelhard.

b538729c90af470c33aeb3002750321ac8ac88be

tdf#160518 sw: fix DOCX import/export of hyphenation-keep

– export hyphenation-page="page" setting of native ODF documents, if hyphenation is enabled in the default paragraph or in the text body style with this setting. It's lossless for hyphenation-keep-type="column", while the other values are converted to hyphenation-keep-type="column", which is the default layout of MSO 2013 and later.

– fix LO roundtrip of DOCX documents which were created in MSO originally: while the roundtrip kept useWord2013TrackBottomHyphenation and allowHyphenationAtTrackBottom, the exported redundant suppressAutoHyphen = "false" settings of the paragraph resulted broken layout in Writer, because the repeated import overwrote every paragraphs with bad hyphenation setting (hyphenation-keep = "auto" instead of hyphenation-keep = "page").

– export also "Hyphenate CAPS" and "Hyphenation zone" settings,  if hyphenation is enabled in text body style with these settings, and not in the default paragraph style. Setting hyphenation only in "Text Body" is more common in documents created in LibreOffice.

0d5b1a072e025a692cee803310d2ceff0296b083

help: tdf#132599 add "Hyphenation across" -> Last full line of paragraph

Document new option of LO 24.8 to control hyphenation in last full line of a paragraph. Fix also the changed IDs of the other "Hyphenation across" options.

Manual tests

The patches contain several unit tests. The next manual tests list only the most important bug fixes:

Fixed DOCX export

1. Open tdf160518_auto_in_default_paragraph_style.fodt (attached to Bug 160518, as “2-page flat ODF” document). The document is 2 pages.

2. Save it in the format “Word 2010–365 Document (.docx)”.

3. Open the result in MS Word: the document is 2-page long, as in Writer. (The old export was 3-page long.)

Disable hyphenation of last full line of paragraphs

1. Open tdf132599_always.fodt (attached to Bug 132599, as “flat ODF test document for "Last full line of paragraph"”). The last full line of the last paragraph is not hyphenated. The previously hyphenated word (“celestial”) is shifted to the last line. (This feature wasn’t supported before.)

2. Click on the paragraph settings of the last paragraph, and enable “Last full line of paragraph” in Text Flow → Hyphenate Across. The word “celestial” is hyphenated.

Tables

1. Open tdf132599_page_in_table.fodt (attached to Bug 132599 as “test document: In tables, do not hyphenate across spread (3-page document)”). The document is 3-page long. (This was 2-page long because of missing support of Hyphenation across in tables.)

2. Click on the paragraph settings of the paragraph, and enable “Spread” in Text Flow → Hyphenate Across. The document is 2-page long, because the hyphenated line “except that it has an at-” is allowed to be bottom of page 1, which is on the right in its spread.

Linked frames

1. Open tdf132599_frames_on_same_page_hyphenation.fodt (attached to Bug 132599 as “test document: Hyphenation across column in linked frames”). Bottom of the left frame is the hyphenated line “space, ex-”. (Previously this line was shifted to the next frame.)

2. Click on the paragraph settings of the paragraph, and disable “Column” in Text Flow →  Hyphenate Across. The hyphenated line “space, ex-” is shifted to the next frame.

Linked frames only on following right pages

1. Open tdf132599_frames_on_right_pages_no_hyphenation.fodt (attached to Bug 132599 as “test document: In linked frames on right pages, do not hyphenate across spread”). The second frame on page 3 starts with the shifted hyphenation line “space, ex-”, according to the disabled Hyphenation across →  Spread setting, because the first frame is on page 1 (a different spread). (This was broken before, because Writer didn’t check, that the page before the right page text content is a left page, i.e. on the same spread, or not).

2. Click on the paragraph settings of the paragraph, and enable “Spread” in Text Flow →  Hyphenate Across. The hyphenated line “space, ex-” is shifted to the bottom of the first frame.

6No Break context menu and visualization

Hyphenated words got a new context menu item “No Break” to disable their hyphenation using the new “Exclude from hyphenation” character formatting. The context menu item remains available for the words with disabled hyphenation to enable their hyphenation again.

The other usability problem was the incomplete user interface of the new character formatting “Exclude from hyphenation”: it was not possible or very hard to notice the words which removed from hyphenation. Now these words got a light gray dotted underline, when Show Formatting Marks mode is enabled.

 

New “No Break” context menu of hyphenated words, and light gray dotted underline visualization of words with disabled hyphenation. (Note: no visualization for the previous workaround, the word with language setting “None” in the second paragraph.)

6.1Developments

Added a new dispatcher call .uno:NoBreak for the context menus. The menu item “No Break” is visible only, if there is a hyphenated word or a word with No Break formatting under the text cursor (with or without selecting the word). The light gray text formatting is conditional, and not visible in the PDF export, and with disabled Show formatting marks.

Commit

Description

2f0c7d5691acd4010443856788a54b0abc03098b

tdf#161563 tdf#161565 sw: add No Break to word context menu & visualize

Add No Break option to context menu of words hyphenated automatically, giving as easy access to fix paragraph layout, as context menu of misspelled words – like DTP software do. Also add this option to context menu of words with enabled "No Break" to disable it.

To avoid unwanted paragraph layout during further text editing or formatting, visualize words excluded from hyphenation with a light gray dotted underline, when Formatting Marks is enabled.

Follow-up to commit b5e275f47a54bd7fee39dad516a433fde5be872d

"tdf#106733 sw: implement CharNoHyphenation" and

commit 73bd04a71e741788a2f2f3b26cc46ddb6a361372

"tdf#106733 xmloff: keep fo:hyphenate in character formatting".

41916d9fb045654fa19b4eac90a3099550a890f7

tdf#161563 sw: show "No Break" context menu only on a whole word

It's possible to set CharNoHyphenation on shorter character sequences, than a word, but the result is not correct (use soft hyphens for alternative hyphenation within words), so limit "No Break" menu item only for selected words. (Not completely, because only Point() is checked for word boundary yet, not also Mark().) If no selection, cursor position must be within the hyphenated word (where "No Break" applied for the whole word automatically).

This fixes also the assert in SwTextFrame::IsInHyphenatedWord(),

when multiple nodes were selected.

b0b691aa32719aa0d41bc0f72480cc455bc414ec

tdf#161563 sw: fix invisible light gray underline for No Break

Light gray underline visualization depended on IsShowHiddenChar() instead of the correct IsViewMetaChar() (Show Formatting Marks).

6.2Manual testing

  1. 1.Open the test documents tdf106733.fodt or tdf106733_LinuxLibertineDisplayG.fodt (attached to Bug 106733). The words with enabled “Exclude from hyphenation” got a light gray underline. 

  2. 2.Click on the Show Formatting Mark (paragraph mark) icon to disable and enable the light gray underline. 

  3. 3.Open the context menu of the hyphenated word in the first paragraph. Choose the first item “No Break” to disable its hyphenation. The word is not hyphenated any more and got a light gray underline. 

  4. 4.Open the context menu of the word with the light gray underline, and choose No Break again. The word is hyphenated again, and no more light gray underline. 

7DOCX interoperability fixes

7.1Support of maximum consecutive hyphenated lines

Value “Maximum consecutive hyphenated lines” wasn’t imported from DOCX files, and the associated ParaHyphenationMaxHyphens wasn’t exported to the OOXML document setting consecutiveHyphenLimit, losing layout interoperability.

Note: OpenDocument interoperability is possible here, false information on page 61 in Eckert et al.: Document Interoperability – Open Document Format and Office Open XML, Fraunhofer Verlag, 2009.

7.2Fix overshrank lines in smart justify

As a regression, smart justify, i.e. space shrinking resulted overshrank lines, i.e. lines with removed spaces and overlapping words when the line hyphenated only in the first call of SwTextGuess::Guess(). (First call calculates the available spaces, the second call makes the  final line break.):

 

After the fix, skipping hyphenation completely:

 

The problem was solved with the reiinstantiation of the SwTextGuess object for the optional second call.

Note: also Caolán McNamara (Collabora Productivity) made a SwTextGuess fix related to a compiler-based code analysis, which was an alternative solution for the reported test document.

7.3Default hyphenation zone

Default hyphenation zone is not zero in OOXML, but ¼ inch, according to the standard (see w:hyphenationZone, ECMA–376 – Offixe Open XML 1st Edition). Because DOCX export of Writer didn’t contain its default zero hyphenation zone, MSO imported the document with a non-zero hyphenation zone, potentially losing the text layout. For example, with normal 11 pt font size, hyphenations al-legory” or fi-nalare disabled with ¼ inch hyphenation zone.

As a continuation of the implementation of the hyphenation zone in Bug 149421, the default hyphenation zone ¼ inch was added to the DOCX import. Also the zero hyphenation zone is always exported, i.e. in case of documents created in Writer, solving the text layout difference.

Note: it seems, MSO doesn’t follow its own standard, because it uses unknown default values in some languages, e.g. bigger as ¼ inch, depending on the language of the operating system. For example, the LibreOffice_tracked-changes_bug.docx of Bug 161628 got 425 twips (~0,75 cm) from Office 365 instead of the standardized 360 twips (¼ inch) on a Hungarian operating system. Microsoft “Open specification” mentions the difference, but  without the details: https://learn.microsoft.com/en-us/openspecs/office_standards/ms-oe376/660d0f16-dffb-48ea-a25d-7210fb2f2a7a.

7.4Analysis of the test document of Bug 149421 (hyphenation zone)

Test document hyphenation_zone.docx (Bug 149421) shows different line break (közvet=lenül/közvetle=nül), while the hyphenated word “közvetlenül” has the same possible hyphenation points in Writer and MSO (köz=vet=le=nül):

 

Composite image: red – MSO, black – Writer

Disabling hyphenation zone (in MSO, setting 0.01 cm) didn’t modify the hyphenation, so the difference is not related to the hyphenation zone. There is no justification and smart justify here (the test document is created in MSO 2010, so it doesn’t use smart justify for justified lines). Choosing sub hyphenation seems to be a bug in MSO, and it needs more investigation.

7.5Developments

Commit

Description

64365dfa67d5a1d8fbc710238a4ea9c492645de4

tdf#161643 sw DOCX import/export of maximum consecutive hyphenated lines

Fix line break interoperability by importing w:consecutiveHyphenLimit to ParaHyphenationMaxHyphens, and exporting ParaHyphenationMacHyphens to w:consecutiveHyphenLimit in OOXML import/export filters.

ca540209a8c20a2734f180d4706d5153bdf64523

tdf#160170 sw: fix overshrunk justified lines at hyphenation

Smart justify uses 2 SwTextGuess::Guess() calls to break a line, but using the same SwTextGuess object resulted overshrunk lines, if the first call resulted hyphenation, because of the bad state of the object for the second call. If we need a second call, now instantiate a new object for it. Regression from commit 36bfc86e27fa03ee16f87819549ab126c5a68cac "tdf#119908 tdf#158776 sw smart justify: shrink only spaces".

Note: the reported test document was already fixed by commit f050103c3324d878b310f37429ea3580a8230905 "stale hyphenation data after skipping blanks".

83733601124f611938c365426485d0001e1fe454

tdf#160170 sw: test for fix overshrank lines with hyphenation

Follow-up to commit ca540209a8c20a2734f180d4706d5153bdf64523

"tdf#160170 sw: fix overshrunk justified lines at hyphenation".

8d8bc48b5efacde6f99d78a557cd052ce9e0ed07

tdf#161628 DOCX import: set default hyphenation zone (1/4 inch)

Default value of hyphenationZone is 360 twips (0.25"). Apply this value, if hyphenationZone is not defined, according to the OOXML standard.

Follow-up to commit 5a079652c1b1f968a851f47995b0a65b84d2d192 "tdf#149421 DOCX: import/export hyphenation zone".

89a80d637e2831d49cdf48921f961b04fd03cffc

tdf#161628 sw DOCX: export zero hyphenation zone, if it's not defined

To keep the layout of the document, export zero hyphenation zone instead of nothing, otherwise it would be 360 twips after importing the document with the default hyphenation zone.

7.6Manual testing

Maximum consecutive hyphenated lines

  1. 1.Open 2007351228.docx (test document of Bug 76163). Check its paragraph setting Maximum consecutive hyphenated lines on Text Flow page in Format → Paragraph → Paragraph… dialog window. Value of the setting is 1 (not zero). 

  2. 2.Save the document in a different place in DOCX format, and reload it. The value is still 1. 

Fix overshrank lines in smart justify

  1. 1.Open the test document tdf160170.fodt (Bug 160170), and check the first line: it contains spaces. 

Export zero hyphenation zone of new documents

Create a new document with hyphenation, and with zero hyphenation zone (default in Writer):

  1. 1.Put the cursor in a paragraph in Default Paragraph Style. 

  2. 2.Choose  Format→Paragraph… Text Flow, and enable Hyphenation. The default hyphenation zone is zero. 

  3. 3.Export the document in DOCX 2010–365 format. 

  4. 4.Reload it, and check hyphenation zone: it is still zero. 

  5. 5.Load the exported document in MSO: search hyphenation settings in the search bar, and check hyphenation zone: it is zero (it was 360, 425 etc. before fixing the export). 

Import default OOXML hyphenation zone

  1. 1.Open the test document LibreOffice_tracked-changes_bug.docx (Bug 161628), which does’t contain w:hyphenationZone definition. 

  2. 2.Check the value of the hyphenation zone in Format→Paragraph… Text Flow: it’s ¼ inch (0,63 cm), not zero, like in MSO. 

8Sidebar hyphenation controls

LibreOffice’s sidebar was originally developed by IBM for L
otus
Symphony, as a fast graphical user interface with rich controls.
This
development added hyphenation controls to
Writer’s
character and paragraph sidebar panels.
S
etting
multiple
hyphenation options with immediate
preview
allows
finding the best paragraph layout much faster than before
, like DTP software do.
 

On the attached screenshot, bottom of the sidebar paragraph panel shows the disabled paragraph-level hyphenation controls with the single visible Hyphenation toggle button to enable automatic paragraph hyphenation.

Middle of the sidebar, Character Panel shows two new icons. The first is the No Break icon, which is sensible on words hyphenated by the automatic paragraph-level-hyphenation. The second new icon is the Insert Soft Hyphen icon, which opens the soft hyphen based Hyphenation dialog window, allowing manual insertion and adjustment of the hyphenation break points, replacing or overwriting the result of the automatic paragraph-level hyphenation. (This can be useful for languages without good hyphenation patterns, i.e. previously for German, when its hyphenation dictionary hadn’t used libhyphen’s compound word based functionality, yet.)

8.1Paragraph sidebar panel

With enabled hyphenation
in the paragraph
, all paragraph-level hyphenation options are
visible.
The toggle buttons
:
Hyphenate CAPS, Hyphenate last word of paragraph,
the
four Hyphenation Across function
s (Hyphenate Across Last Full Paragraph Line, Column, Page and Spread).
The other six
spinbox
controls:
Minimum characters at line end/at line begin,
m
ax. consecutive hyphenated lines,
m
in. compound constituent characters at line end,
min.
word length and
h
yphenation
z
one
(the area at line end, where hyphenation is skipped, if it’s possible)
.
 

8.2Developments

Extending the user interface of LibreOffice is not an easy thing, because of the complexity and lack of the documentation of the code base. For example the commit description mentions, that to create a visible icon from the previously added .uno:NoBreak dispatcher call, in one of the resource files, value 8 must be changed to value 9 (a bitset in the XML), too.

Also the first approach (attached to the issue in the bug tracker), i.e. adding new UNO calls for all controls, didn’t work correctly because of a similar undocumented setting. The committed patch is more compact, but it’s possible, that it uses only resized versions of the 16×16 pixel icons instead of the 24×24 and 32×32 icons on high resolution displays.

Note: it’s possible to add new icons for arbitrary functions without UNO calls, using the links.txt files of icon-themes, but instead of designing new icons based on the SVG source files of the existing icons, existing icons were reused to avoid of designing 3-3 icon sizes for all the half dozen icon themes.

Commit

Description

98f7f540463c533da17d4e8595c091d9e98a6c83

tdf#162491 tdf#125032 add hyphenation settings to sidebar

Add .uno:NoBreak to the Character sidebar panel to disable automatic hyphenation for selected words. The icon is enabled only on hyphenated words and words with disabled hyphenation.

Add .uno:Hyphenate icon to the Character sidebar panel, with tooltip “Insert Soft Hyphen...”, which opens the dialog for (semi-)automatic insertion of soft hyphens.

Add paragraph-level hyphenation settings to the Paragraph sidebar panel. Only the toggle button icon "Hyphenation" is visible to enable hyphenation, if the paragraph is not hyphenated. If it's enabled, show all paragraph-level settings.

These new sidebar controls allow adjusting paragraph layout and hyphenation quickly, like DTP software do.

Note: to add icon to .uno:NoBreak, modify "Properties" of officecfg/registry/data/org/openoffice/Office/UI/WriterCommands.xcu.

Note: it's possible, that high resolution icon sizes will need extra dispatcher calls (the draft is attached to the issue in the bug tracker).

8.3Manual testing

Daily builds of LibreOffice allows to check the new sidebar functions.

  1. 1.In Writer, enable the sidebar, if needed, with Ctrl+F5 or View→Sidebar, and 

  2. 2.put the text cursor in a paragraph with working hyphenation (it depends on language of the document and the installed hyphenation patterns). 

  3. 3.Click on the Hyphenation toggle button at the right bottom corner to enable and disable the hyphenation of the paragraph. 

  4. 4.Disable toggle button Hyphenate Last Full Paragraph Line to remove hyphenation in the last full line of the paragraph (if it’s possible without increasing the lines of the paragraph). 

  5. 5.Increase the value of the spinbox At line end to remove some of the remaining hyphenations in the paragraph. 

  6. 6.Setting or updating hyphenation setting for the paragraph style, click on the Update Selected Style icon on the Style sidebar panel or press Ctrl+Shift+F11. 

8.4Possible future developments

To avoid of the crowded sidebar toolbar with several Hyphenation Across icons, it’s possible to compress the four icons into a popup icon menu.

Also hiding the No Break icon may be better, than graying out.

It’s possible to extend Hyphenate CAPS with Hyphenate Caps, i.e. avoid hyphenation of all the words started with a capital letter (like DTP software do).

Also No Break has more functionality in DTP software, than disabling hyphenation: applying on multiple words, it disables also the line break at spaces, i.e. between words, too.

It’s recommended to enhance the soft hyphen based hyphenation dialog window to allow removing soft hyphens (like DTP or XSL-FO software offer similar functionality).

9Inline heading

Inline heading is a DOCX feature, using so called style separators to hide paragraph mark of the heading, resulting a paragraph with multiple parts formatted with different paragraph styles. According to Bruce Hatfield in Bug 131728, “Word style separators are perhaps the key feature lacking in Libreoffice that prevents widespread Law office/legislator usage of LO Writer.” He translates “patents and other documents for US Federal court (+ other jurisdictions) and first came aware of the use of style separators to minimize navigation pane content to essential content (instructions from judge’s clerk). Basically it means only the essential part of a heading can now be indexed in the navigation pane/pdf bookmarks (converted from word).”

The following screenshot shows the first result of the interoperability development, lost (left black text) and fixed (right black text) import of DOCX inline headings.

Composite screenshots showing the lost (left black text) and fixed (right black text) import of DOCX documents with inline heading. (Red color: MSO, black: LibreOffice Writer.) The remaining difference is not related to inline heading, only to the different paragraph spacing. (Click on the image to show more details.)

9.1Developments

The import uses inline text frames to 1) keep the original paragraph with its heading style, keeping also the Table of Content and PDF bookmark support 2) but allowing to put it in the same line of a normal paragraph.

Commit

Description

5a3476284187

tdf#131728 sw inline heading: fix DOCX paragraph layout interoperability

Fix layout of paragraph – which contains two paragraph styles – by importing OOXML style separator using a text frame.

Inline headings – where there is no paragraph break after the heading, i.e. it's followed by the normal paragraph content – specified by w:specVanish in OOXML, i.e. a special paragraph with hidden paragraph mark. These headings were loaded as normal, separated paragraphs, breaking the paragraph layout with their paragraph breaks.

Map inline headings to inline ODF text frames to keep paragraph layout.

The frame contains the original inline paragraph, still keeping

ODF ToC and PDF bookmark support.

9.2Manual testing

Daily builds of LibreOffice allows to check the fixed DOCX import.

  1. 1.In Writer, open the test document attached to Bug 131728. The inline headings are not in separated paragraphs, but they are in the same line, as their continuations (see on the right side of the previous screenshot). 

10New DOCX interoperability fixes for space shrinking

Releasing LibreOffice with the new space shrinking interoperability algorithm resulted new bug reports for specific circumstances where implementation has not worked or has worked poorly. All reported bugs were fixed.

The special circumstances were:

The list of the developments:

Commit

Description

6b857398a59d16308d6185d01e003e401439f060

tdf#162109 sw smart justify: fix overhanging last line

Last line of justified paragraphs is excluded from justification normally, but not in the case, where it fits only with shrinking spaces. This line was overhanging because of the missing justification and space shrinking.

22eac3145ca62d15b47d95f4df60ce38d4f5aa46

tdf#162220 sw smart justify: fix shrinking for single portion lines

Follow-up to commit 6b857398a59d16308d6185d01e003e401439f060 "tdf#162109 sw smart justify: fix overhanging last line".

1a87cd290282ea723c5d1a0d80c958b705b9d7ec

tdf#162109 tdf#162220 sw smart justify: add unit tests

Follow-up to commit 6b857398a59d16308d6185d01e003e401439f060 "tdf#162109 sw smart justify: fix overhanging last line" and commit 22eac3145ca62d15b47d95f4df60ce38d4f5aa46 "tdf#162220 sw smart justify: fix shrinking for single portion lines".

857dd6000c877f2c6d8bb73806a8557fa0baea73

tdf#161810 sw smart justify: fix overhanging lines containing tabs

Length of tabulator portions wasn't taken into account during calculating overhanging lines, resulting missing space shrinking.

2baaf66b71fd429479dddb41f6b06aa7bba61039

tdf#163042 sw smart justify: fix cursor of single portion lines

Fixed problems with cursor/pilcrow positions in a shrunk last line of a paragraph with a single portion:

  • clicking before the last or the last but one characters of the line, the cursor could be positioned at the end of the line, not before the last or the last but one characters (especially a line with more spaces and bigger space shrinking). 

  • when the text cursor was there at the end, the visible cursor position was inside the line instead of the end of the line; 

  • pilcrow symbol was inside the line instead of the end of the line. 

7a78be5090d2dabf03e18471ba718f2c4f25f740

tdf#163060 sw smart justify: fix unwanted line break inside words

End-of-line narrow line portion was broken into the following line, despite that it was inside a word, if the remaining free space for the line portion was negative in the line (GetLineWidth()), because of missing calculation with the extra available line width resulted by space shrinking.

10.1Manual testing

Daily builds of LibreOffice allows to check the improvements:

  1. 1.In Writer, open the test document hanging_punctuationx.fodt of Bug 162109. The last (second) line of the paragraph contain shrunk spaces, i.e. there is no overhanging line. 

  2. 2.Open the test document test240726.docx of Bug 162220. The last line of the paragraph contain shrunk spaces, i.e. there is no overhanging line. 

  3. 3.Open the test document Answers On A Postcard.docx of Bug 161810. The first lines of the paragraphs contain correctly shrunk spaces. i.e. there are no overhanging lines. 

  4. 4.Open the test document bad_cursor_and_pilcrow_positions.fodt of Bug 163042. 1) Clicking before the last or the last but one characters of the line, the cursor is positioned there, not at the end of the line. 2) Enabling Formatting Marks, the paragraph marks is there at the end of the line, not inside the line. 3) When the text cursor is there at the end of the line, the visible cursor position is there, too, i.e. typing or deleting the text by backspace, the visible cursor position follows the real cursor position. 

  5. 5.Open the test document test.docx of Bug 163060. There is no broken line between the word “pumping” (it was broken after its first letter). 

11Inline heading: DOCX export

Adding DOCX export to the previous DOCX import/layout support solved the serious interoparibility issue of the previous LibreOffice versions. Now inline headings stay inline instead of converting them separated paragraphs during the DOCX round-trip.

11.1Developments

DOCX round-trip of inline headings results in a document with the original OOXML style separators.

Commit

Description

d87cf67f8f3346a1e380383917a3a4552fd9248e

tdf#131728 sw inline heading: fix missing/broken DOCX export

Fix layout interoperability during DOCX round-trip by grab-bagging w:p/w:pPr/w:rPr/w:specVanish, i.e. the style separators.

Note: use FrameInteropGrabBag to select the text frames, which are inline headings, exporting only their text content (a single

paragraph), and use also ParaInteropGrabBag to export w:specVanish.

Note: specVanish lost completely originally, converting inline

headings to normal paragraphs. After commit 56588663a0fddc005c12afaa7d3f8874d036875f, text frames (the workaround for inline heading/ToC/bookmark support) were exported instead of plain paragraphs, which were broken at least in LibreOffice.

Follow-up to commit 56588663a0fddc005c12afaa7d3f8874d036875f "tdf#131728 sw inline heading: fix DOCX paragraph layout interoperability".

11.2Manual testing

Daily builds of LibreOffice allows to check the fixed DOCX import.

  1. 1.In Writer, open the test document attached to Bug 131728. The inline headings are not in separated paragraphs, but they are in the same line, as their continuations (see on the right side of the previous screenshot). 

  2. 2.Save the test document as e.g. new_export.docx, and use File→Reload to reload the document. 

  3. 3.Inline headings are still inline headings, not separated paragraphs or lost in a broken text layout. 

12Inline heading: UX support

Inline headings are part of APA Style, IEEE-, MIL-STD-961E-format, US Federal court style guides, other technical and legal formatting standards, and with these developments, LibreOffice got initial UX support to create inline heading: applying a style on the first selected words or sentence of a paragraph results in an inline heading.

Usage

For example, APA Style defines Level 4 heading as

        “Indented, Bold, Title Case Heading, Ending With a Period. Text begins on the same line and continues as a regular paragraph.” (https://apastyle.apa.org/style-grammar-guidelines/paper-format/headings).

A triple-click on the beginning of the paragraph selects the first sentence, and choosing Heading 4 from the Formatting Toolbar (or pressing Ctrl-4) converts the selected sentence to an inline heading with Heading 4 style:

 

1. Select the beginning of a paragraph and apply a heading style.

 

2. The result is an inline heading with the selected heading style.

12.1Developments

The UX extension added support to use inline headings in new ODT files.

Commit

Description

7a35f3dc7419d833b8f47069c4df63e900ccb880

tdf#48459 sw inline heading: apply it on the selected words

Selected text at the beginning of a paragraph (<= 75 characters) become text frame based inline heading at applying a paragraph style (using Formatting toolbar, context menu, Ctrl-1...Ctrl-5 or

Styles sidebar panel).

If the whole paragraph is selected, or if no or multiple paragraphs are selected, formatting is still applied on the whole paragraphs.

Using text frames for inline heading is ODF 1.0 compliant and fully back-compatible with the older Writer versions.

The new inline heading frame contains direct formatting to zero the upper and bottom paragraph margin to solve interoperability issues: in MSO, margins of heading styles are zeroed by using the style separators.

Note: lack of inline heading was a showstopper for creating APA-, IEEE-, MIL-STD-961E-format, legal etc. documents.

Note: recent Formula frame style will be replaced by the planned Inline Heading, which will be used by the DOCX filter to export OOXML style separators instead of text frames.

12.2Manual testing

Daily builds of LibreOffice allows to check the enhanced Writer user interface.

  1. 1.In Writer, write “lorem” and press F3 to insert a sample text in a new empty document. 

  2. 2.Select the first word of one of the paragraph e.g. by double-click on it. 

  3. 3.Press Ctrl-4 or choose Heading 4 in the Set Paragraph Style drop-down list on the Formatting toolbar to convert the selected text to an inline heading. 

  4. 4.Click on it to put the cursor inside the inline heading text frame to show that the inline heading is formatted with the selected heading style. Press Ctrl-F10 or click on the Toggle Formatting Mark icon on the Standard toolbar (also enable View→Text Boundaries, if needed) to show the (invisible) border of the inline heading text frame. 

  5. 5.Press Escape to select the text frame. Enable Styles sidebar panel by pressing F11 or View→Styles to show the style of the text frame (recently “Formula”). 

12.3Planned developments

Possible continuation of the inline heading development contains several issues:

  1. 1.Add DOCX export of the newly created inline headings, e.g. by creating and using a default Inline Heading frame style; 

  2. 2.Add HTML export to support web publishing in APA Style, and in other formatting standards; 

  3. 3.Fix ordering of the PDF bookmarks (where frame content exported separately from the headings; 

  4. 4.Add Navigator support to modify order of inline headings with their body text; 

  5. 5.Fix indentation issues; 

  6. 6.Support inline heading conversion of multiselection, e.g. after selecting first word or sentence of multiple paragraphs by a regular expression based text search; 

  7. 7.Extend OpenDocument/UNO to support style-level inline heading (keeping also back-compatibility). 

13New DOCX interoperability fixes for space shrinking

Releasing LibreOffice with the new space shrinking interoperability algorithm resulted new bug reports for specific circumstances where implementation has not worked or has worked poorly. All reported bugs were fixed.

The special circumstances were:

The list of the developments:

Commit

Description

1fb6de02709a5f420f21ebd683915da50ce0d198

tdf#163149 sw smart justify: fix line shrinking at image wrapping

Limited line width at image wrap could result negative nSpaceAdd value, i.e. paragraph line with extra letter spacing instead of line shrinking.

Regression from commit 17eaebee279772b6062ae3448012133897fc71bb "tdf#119908 sw smart justify: fix justification by shrinking".

270c96e12c4a14c4f9e130d15310843da3a6af68

tdf#163575 sw smart justify: fix size resolution for SwBidiPortion

Negative space sizes (i.e. shrunk lines at image wrapping) stored over LONG_MAX/2, and these values had no resolution in SwBidiPortion, causing crash/assert in conversion of DOCX document containing e.g. Arabic text wrapping around images.

Note: apply the resolution in SwDoubleLinePortion, too.

Regression since commit 1fb6de02709a5f420f21ebd683915da50ce0d198 "tdf#163149 sw smart justify: fix line shrinking at image wrapping".

13.1Manual testing

Daily builds of LibreOffice allows to check the improvements:

  1. 1.In Writer, open the test document Image-square-tight-wrap_C15.docx of Bug 143934. The document doesn’t contain overhanging lines (which problems are visible on the old PDF export attached to Bug 163149). 

  2. 2.In Writer, open the test document forum-mso-en-6704.docx of Bug 163575. Save it in .odt format. No assert/crash occurs. 

     

14Inline heading UX, (X)HTML, PDF & Navigator support

According to the planned developments, the following improvements added to the inline heading support:

  1. 1.Only the imported DOCX inline headings had correct DOCX export, because of using “grab-bagging”, i.e. storing OOXML extra data temporarily for the round-trip. Now the inline headings newly created in Writer uses the new Inline Heading frame style, also grab-bagging was replaced (mostly, see commit description) by using this frame style, so now it’s possible to create standard DOCX style separators in Writer, too, simply selecting the first words or sentence of a paragraph, applying a heading style, and exporting the document in DOCX. 

  2. 2.There are two different HTML export filters in Writer, a C++ one (Save As HTML Document (Writer)), and an XSLT-based (Export… → XHTML). Both of them were extended to export inline headings correctly, allowing to use APA Style and other design with inline headings on the web, too. 

  3. 3.Fix ordering of the PDF bookmarks (where frame content exported separately from the headings), solving a long-standing problem for all headings stored in frames and tables; 

  4. 4.Add Navigator support to modify order of inline headings with their body text using Move Chapter Up/Down icons of the Navigator. 

Commit

Description

a1dcbd1d1ce6071d48bb5df26d7839aeb21b75a8

tdf#48459 sw inline heading: add Inline Heading frame style

Add the new frame style Inline Heading with default variable width and anchoring as character to support UX better – and later, – interoperability.

49765a9e7be41d4908729ff7d838755276b244cb

tdf#48459 tdf#131728 sw inline heading: new frame style: fix DOCX export

Export Inline Heading frame style as OOXML style separator, fixing interoperability of the newly created inline headings.

Remove frame grab-bagging, use the new Inline Heading style for round-trip of DOCX documents. Note: paragraph grab-bagging is used only for the case, where no frame created for the inline heading (second or more inline heading paragraphs in the same paragraph layout).

Now the DOCX import uses anchoring as character, fixing layout problems of short paragraphs with big size headings anchored *to character* previously.

3056c75db5b2a8a945621349fea9ee1183872989

tdf#95239 sw: fix wrong order of PDF ToC, if headings put in text frames

PDF outlines (called also as PDF bookmarks or ToC) contained headings in the wrong order if they were placed in a text frame:

Heading 2 (frame) ... 2

Heading 3 (frame) ... 2

Heading 1 ........... 1

Now PDF export didn't list text frame headings only at the start of the ToC, but in their correct position and hierarchy, based on the page and vertical position of the headings:

Heading 1 ................ 1

  Heading 2 (frame) ...... 2

     Heading 3 (frame) ... 2

This is useful for the recently implemented inline headings, where e.g. APA Style Heading 4 and Heading 5 are there in text frames anchored as characters, see tdf#48459.

Extend PDFium test environment for bookmarks, and add tdf#131728 DOCX and an APA Style .fodt unit tests.

Note: if the higher headings are only in text frames, but not the lower ones, only the order corrected, but not the full hierarchy, yet.

4f871a5a9aa1a81831bd525f31023b3915432dd7

tdf#48459 sw inline heading: fix bad NC of the new frame style

Which affected translation etc.

984f0e49d35ba87c105310f27d945147a23d1198

tdf#163874 sw inline heading: fix XHTML export

Text frames anchored as characters, containing inline headings were exported without display:inline using the XSLT XHTML filter (see File->Export As), resulting separated paragraphs for inline headings.

Note: both div and heading elements need display:inline.

This reverts commit d2e8705c9cc503afdaed366b1f71ed012b0c568f "tdf#153839: add newline after certain tags" partially (and conditionally), because fancy printing is bad here: the new line between the div element was converted to a space at display:inline, resulting double space with the original space between the inline heading and the other text content of the following part of the paragraph line.

bcee366d82392b02745b0070a312624e7baa29d3

tdf#163873 sw inline heading: use HTML text, not lo-res bitmap export

Text frames formatted with Inline Heading style and anchored as characters were converted to unacceptably low resolution images in the HTML export. Now the HTML export contains normal h1–h6 elements with display:inline; CSS setting to get text-based/searchable inline headings, fixing also the rendering quality.

32398232e925d18d2ac5a6d467b61e1a84a0df7c

tdf#164074 sw inline heading: add up/down outline moving

Move inline headings with their outline tree in Navigator, clicking on the Move Heading Up/Down icons.

Instead of changing CompareSwOutlineNodes, which breaks the code at other places, add a new SwOutlineNodesInline and CompareSwOutlineNodesInline to sort inline headings (put in Inline Heading frames) with normal headings only for MoveOutlinePara and other part of Navigator’s outline moving.

Reordering chapters and sections using the Navigator was limited for normal (root) headings, but not for headings in text frames and tables.

Recent implementation of inline headings use text frames with Inline Heading frame style, anchored as characters to their paragraphs. Now these inline headings are movable with the Navigator, with their outline tree, i.e. the paragraph where the inline heading anchored as character, the following paragraphs without inline heading, or the following subsections.

Note: selecting the inline headings is possible by the Navigator content tree or or by clicking inside the text of the inline heading in the document.

Note: according to the fix for tdf#143569, multiple headings in the same text frame or table are ordered alphabetically in the Navigator. This doesn't effect the inline headings, where there is only a single heading in an Inline Heading text frame.

20ee4ecfeb1c504f0bc8a3057ff30e2e61f4b300

df#164074 sw inline heading: clean-up outline moving

Fix bad condition with ++std::npos in the following !WithChildren inline heading branch:

else if( pOutlNdsInline && ++nEndPosInline < pOutlNdsInline->size() )

The problem was reported by Caolán McNamara.

Remove also an unused variable in docnum.cxx.

14.1Manual testing

Daily builds of LibreOffice allows to check the improvements:

DOCX export of the newly created inline headings

  1. 1.Create sample paragraphs (e.g. writing “lorem” and pressing F3), and create inline headings in Writer by selecting the first word(s) or sentence them, and apply a heading style, e.g. by pressing Ctrl-1 – Ctrl-5 or using the style selector of the Formatting toolbar. Select an inline heading, press Escape to select its frame. In the Style (F11) sidebar, the selection shows the new frame style “Inline Heading”. 

  2. 2.Save and reload the document in DOCX format. The result is similar, it uses the same Inline Heading frame style, as before. (Note: previously the newly created inline headings were destroyed during the DOCX round-trip.) 

PDF bookmark export

  1. 1.In Writer, open the test document tdf95239.fodt of Bug 95239. The document contains inline headings in text frames. 

  2. 2.Export it as PDF, with enabled General→Structure→Export outlines setting (the default setting for PDF export); 

  3. 3.Check the outline tree (PDF bookmark) in a PDF viewer, e.g. in GNOME Document Viewer/Mozilla Firefox/Google Chrome, with enabled Sidebar, choosing the Outline pane. The inline headings are in the outline tree, not before it. 

Export XHTML

  1. 1.Open the previous test document. 

  2. 2.Choose Export… → XHTML (.html;.xhtml) format. The exported HTML document contains “display:inline” style settings. 

  3. 3.Check the HTML export in Mozilla Firefox/Google Chrome. The inline headings are inline in the browsers, too. 

Save As HTML Document (Writer)

  1. 1.In Writer, open the test document attached to Bug 131728. The inline headings are not in separated paragraphs, but they are in the same line, as their continuations. 

  2. 2.Open https://bugs.documentfoundation.org/attachment.cgi?id=162196 (test document of Bug 131728). 

  3. 3.Using File →Save As, save the document in HTML Document (Writer) (.html) format, and open the HTML file in a browser. The inline headings are inline headings with normal text in the browser, not low-resolution images. 

Navigator Move Up/Down support

  1. 1.In Writer, open tdf164074.fodt attached to Bug 164074. The inline headings are on Heading 4 and Heading 5 levels. 

  2. 2.Click inside the text of the first Heading text Lorem”. 

  3. 3.Open the Navigator (F5). The selected outline item is “1.1.1.1. Lorem”. 

  4. 4.Click on the Move Heading Down icon of the Navigator. The 1.1.1.1 and 1.1.1.2 sections are replaced with each other: 

    1.1.1.2. Aliquam… 

    1.1.1.2.1. Praesent… 

    1.1.1.2.2. Donec… 

    1.1.1.1. Lorem… 

    1.1.1.1.1. Vestibulum… 

    1.1.1.1.2. Integer… 

 

László Németh

2024-12-02

 

This project was funded through the NGI0 Entrust Fund, a fund established by NLnet Foundation with financial support from the European Commission's Next Generation Internet programme. More information: https://nlnet.nl/project/LO-Typography/