LibreOffice/Collabora Online Typography
Line break interoperability and state-of-the-art ISO OpenDocument/web typography in open source office suites
LibreOffice, the open source office suite is the reference implementation of ISO OpenDocument (ODF) format. As part of LibreOffice Technology, Collabora Online is key for the digital sovereignty of open source online document editing. Metrically equivalent fonts cannot guarantee MS Word-interoperability any more because of the undocumented changes in MS Word line break algorithm after ODF and OOXML standardization. The planned project solves this fundamental problem, adding also the the following interoperability and CSS4 web typography features and fixes to LibreOffice line break algorithm and layout:
•.tdf#119908 Fix line break interoperability of LibreOffice and Collabora Online in the form of a new default paragraph layout, which is equivalent of the new and undocumented algorithm of MS Word 2013 and later;
•.tdf#132599 Implement ODF attribute fo:hyphenation-keep, add documented interoperability and state-of-the-art CSS4 web typography feature “stop words hyphenating across pages” to LibreOffice and Collabora Online;
•.tdf#106733 Implement ODF attribute fo:hyphenate to exclude a portion of text from hyphenation;
•.tdf#149421 Fix non-implemented details of the hyphenation zone interoperability.
All the developments are libre and open source, and will be integrated with the main development branch of LibreOffice, and released with its next stable release.
Layout difference of MS Word (upper) and LibreOffice (bottom): line break in justified paragraphs supports shrinking of spaces between words since MS Word 2013. Only disabling the justification results in the same layout, moving the word “Antarctica” to the second line in MS Word, too. The bottom paragraphs contain hundreds of adjacent spaces shrinking only in MS Word.
MS Word: in the first justified paragraph, the first shrunk line contains “Antarctica”.
LibreOffice Writer: lost layout in the first (and bottom) justified paragraphs.
With 0.1 pt resolution, up to 2% shrinking was measured for plain justified lines (no direct character formatting, no hyphenation). The following picture shows, how the shrinking is implemented in MSO by using the available spaces (but the maximal shrinking is independent from the number of the spaces – at least in a line with enough spaces):
Black text: MS Word, red text: LibreOffice Writer – the black text contains an extra character spacing after the text “3 Au” to give the same position after the last space. (Click on the image to show more details.)
Commit 7d08767b890e723cd502b1c61d250924f695eb98 “tdf#130088 tdf#119908 smart justify: fix DOCX line count + compat opt.” is the fix for the line count/page count differences and the initial fix for the new justification:
•.It recognizes the DOCX documents with the new default MSO 2013+ justification;
•.fix line/page count in import of DOCX documents with MSO 2013+ justification: break lines at same positions in plain text, resulting rendering consistency (no more lines and pages) in LibreOffice/Collabora Online importing DOCX files with the new default justification, but with exceeding lines yet;
•.add new LibreOffice compatibility option “JustifyLinesWithShrinking” to enable line break interoperability for DOCX documents, and store this setting in ISO OpenDocument, native document format of LibreOffice, too;
•.and add unit test for these.
A simple lorem.docx document (attached to tdf#119908) was created in MSO 2016 to test and show the improved line break, which solved the line/page count interoperability in DOCX import of LibreOffice Writer for plain text:
MSO, Writer before the fix, Writer after the fix. Blue marks: correct line break positions in MSO and improved Writer, red marks: bad line breaks in Writer before the fix. (Click on the image to show more details.)
Shrinking has been added by commit 17eaebee279772b6062ae3448012133897fc71bb “tdf#119908 sw smart justify: fix justification by shrinking” and commit c1803de8a093739d189be54b2d9bd5634e9e79ee “tdf#119908 sw smart justify: add unit test”, fixing the temporary exceeding lines, i.e. justifying them.
The following composite picture shows the previous (red), and the shrank (black) lines in Writer. With this commit, the result is the same, as in MS Word. (Click on the image to show more details.)
The following composite screenshots of MS Word (red) and Writer (black) show the fix of handling multiple text portions (middle), and the shrinking algorithm (right), which resulted the same line breaks, as in MS Word (test document: lorempage.docx of Bug 158333, generated PDFs: Word, Writer). (Click on the image to show more details.)
The related commits in LibreOffice code base:
Commit | Description |
tdf#158333 sw smart justify: fix multiple text portions Multiple text portions, e.g. if some part of a line contains direct character formatting breaks DOCX interoperability of justified paragraphs. | |
tdf#119908 tdf#158419 sw smart justify: fix cursor position Text cursor didn't follow the new word positions yet, because of unsigned casting of the negative shrinking value. | |
tdf#119908 tdf#158436 sw smart justify: fix freezing with NBSP Stop shrinking during underflow, because it resulted endless layout loop, e.g. when a very short word followed by a no-break space. The problem reported by Miklós Vajna (Collabora Productivity). | |
tdf#119908 tdf#158776 sw smart justify: shrink only spaces For interoperability, only shrink spaces up to 20%, not the lines up to 2%. | |
tdf#159102 sw smart justify: fix automatic hyphenation As before with soft hyphens, automatic hyphenation could result too much shrinking, because of calculating with an extra non-existing space in the line. Also try to shrink the line only if a space likely will be available in it. |
(Note: commit 93ab1bc6be7b46226874810ec4fc3c61d5d0fc7c reverted the removal of unit test testOfz64109, caused by the second commit by accident. Reported by Miklós Vajna.)
Text portions (spans or runs) are associated to direct character formatting of the paragraphs, or simply resulted by editing the text without formatting difference.
The first implementation calculated only with the last text portion of the line, which didn’t cause problem, if there was only a single text portion, otherwise resulted different line breaks (less or missing shrinking). The first screenshots show, that the test file contained multiple bad line break resulted by the multiple text portions in the lines. These three bad line breaks were fixed by the first commit.
Selected text and the text cursor were in wrong position, over the exceeding, i.e. not shrank line instead of the visible shrank line. The second commit adjusted cursor movement and text selection to the shrank lines.
Extending text layout code to shrinking resulted an infinite loop, e.g. when a paragraph ended with a no-break space initiated the recalculation of the line with underflow. This was solved by disabling shrinking for underflow.
Removing spaces instead of adding them/replacing them allowed more precise comparison of MSO and LibreOffice, see the following data measured with the previous test file, which show up to 20% shrinking of spaces instead of up to 2% shrinking of the lines.
paragraph width | max spaces in line | space width |
|
|
|
347.53 pt | 104 | 3.34 pt |
|
|
|
|
|
|
|
|
|
| allowed extra character spacing in the line (pt) | extra spacing by shrinking | shrinking | ||
spaces | MS Word | Writer | difference (pt) | line | spaces |
6 | 5.98 | 1.95 | 4.03 | 1.2% | 20.1% |
5 | 8.08 | 4.65 | 3.43 | 1.0% | 20.5% |
4 | 10.18 | 7.45 | 2.73 | 0.8% | 20.4% |
3 | 12.23 | 10.25 | 1.98 | 0.6% | 19.8% |
2 | 14.33 | 13.05 | 1.28 | 0.4% | 19.2% |
1 | 16.43 | 15.75 | 0.68 | 0.2% | 20.3% |
|
|
|
|
|
|
Low precision (approximated recalculation of multiple text portions?) | |||||
9 | 11.15 | 4.15 | 7 | 2.0% | 23.3% |
12 | 9.63 | 1.35 | 8.28 | 2.4% | 20.6% |
21 | 16.83 | 5.85 | 10.98 | 3.2% | 15.6% |
The last commit changed the algorithm to this, resulting the same line break not only for the previous test file, but for the originally reported test document of tdf#119908: (Click on the image to show more details.)
Hyphenation was only a paragraph-level feature in LibreOffice, while the OpenDocument standard allows to disable hyphenation in character formatting, too. The related commits in LibreOffice code base, which solved the problem, allowing to disable words from hyphenation.
Commit | Description |
tdf#106733 xmloff: keep fo:hyphenate in character formatting In the case of character formatting, map fo:hyphenate to the unused CharNoHyphenation character property to keep it during ODF import/export instead of losing it completely. This is the first step to disable hyphenation for single words or text spans in paragraphs with automatic hyphenation. Note: using fo:hyphenate as character property is part of the ODF standard. Note: the old workaround to disable hyphenation, changing the language of the text to None had got some serious fallback: losing spell checking and losing language-dependent text layout (supported by both OpenType and Graphite font engines in LibreOffice). | |
tdf#106733 sw: implement CharNoHyphenation Implement CharNoHyphenation character property to disable automatic hyphenation of words in paragraphs with enabled hyphenation. Fix also fo:hyphenate mapping to CharNoHyphenation using automatic inversion of their boolean values defined by xmloff's XML_TYPE_NBOOL, as suggested by Michael Stahl. | |
tdf#106733 sw: fix bad downcast in SwTextNode::GetLang() Fix bad cast of SvxNoHyphenItem to SvxLanguageItem reported by <https://ci.libreoffice.org/job/lo_ubsan/3049/>. | |
tdf#106733 sw cui: add CharNoHyphenation checkbox On Position tab of Character formatting dialog window as a new checkbox "Exclude from hyphenation" (UX design by Heiko Tietze). With this, it's possible to disable hyphenation with direct character formatting (e.g. combined with Find All), or using character styles, and setting “Exclude from hyphenation” in them. This feature is conformant to the OpenDocument standard, and unlike the previous locale=None workaround, it keeps spell checking and locale dependent text layout. |
(Note: commit 1b83ebf42c535528b73baac2407b347f19070d07 disabled the unit test of tdf#159102 temporarily, because of lack of hyphenation on some test builds. Reported by Noel Grandin and Miklós Vajna.)
The following screenshot shows the new option in Character formatting dialog window:
This development adds new typographical features standardized by OpenDocument, XSL and CSS 4 to LibreOffice Writer to guarantee MSO interoperability and more: following typographical rules and web standards.
Not only justified text, but left-aligned etc. texts have lost their interoperability since MSO 2013, depending on hyphenation. New default page layout algorithms of MSO truncates the last hyphenated word (before MSO 2013) or line (MSO 2013 and newer), following the English typographical traditions and rules.
Also OpenDocument standard contains the same feature, fo:hyphenation-keep, but this wasn’t implemented before, also LibreOffice hasn’t kept this setting during import/export of an ODF document.
For more control on hyphenation, also better conformance with standard typographical rules, LibreOffice Writer implements “hyphenation across” values of XSL and CSS 4 which haven’t been covered by OpenDocument, yet (loext:hyphenation-keep-type).
As a new development, LibreOffice not only keeps the value of fo:hyphenation-keep, but the page layout moves the last hyphenated line of the page (or column) to the next page (or column), similar to MSO 2016 and newer.
Commit | Description |
tdf#132599 cui offapi sw xmloff: implement hyphenate-keep Both parts of a hyphenated word shall lie within a single page with ODF paragraph setting fo:hyphenation-keep="page". The implementation follows the default page layout of MSO 2016 and newer by shifting the bottom hyphenated line to the next page (and to the next column). Note: this is a MSO DOCX interoperability feature, used also in DTP software, XSL and CSS.
* Add checkbox/combobox to Text Flow in paragraph dialog * Store property in paragraph model (com::sun::star::style::ParagraphProperties::ParaHyphenationKeep) * Add ODF import/export * Add ODF unit tests
New constants of com::sun::star::text::ParagraphHyphenationKeepType, containing ODF AUTO and PAGE (borrowed from XSL), and for the planned extension ParaHyphenationKeepType of ParagraphProperties:
– COLUMN (standard XSL value, defined in https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)
– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last, equivalent of hyphenation-keep, defined in https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).
Note: the implementation truncates only a single hyphenated line, like MSO does: the pages can end in hyphenated lines (i.e. in the case of consecutive hyphenated lines), but less often, than before.
Clean-up hyphenation dialog by collecting "Don't hyphenate" options at the end of the hyphenation settings, and negating them (similar to MSO and DTP), adding also the new option "Hyphenate across column and page":
[x] Hyphenate words in CAPS [x] Hyphenate last word [x] Hyphenate across column and page
Note: ODF fo:hyphenation-keep has got only "auto" and "page" attributes, while XSL defines also "column". Because of the interoperability with MSO and DTP, fo:hyphenation-keep="page" is interpreted as XSL "column", avoiding hyphenation at the end of column, not only at the end of page. |
The following screenshot shows the new “Hyphenate across column and page” option on “Text Flow” pane of Paragraph formatting dialog window: the last hyphenated line “except that it has at-” was shifted to the next page. Also the clean-up of “Hyphenate words in CAPS/last word” options is visible.
Recent OpenDocument standard has not fully adopted values of the XSL attribute hyphenation-keep: value “column” is missing (https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep). Also according e.g. to New Hart Rules for English (OUP, 2005, “Line Endings”, p. 40, cited by R. Green in tdf#132599), it’s a typographical requirement to avoid hyphenation in the last line of a spread, i.e. a visible page pair. That is why CSS 4 defines “spread”, not only “page” and “column” to control hyphenation.
The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page” and “Spread” on Text Flow pane of the Paragraph settings:
Following screenshots show layout of the test documents in LibreOffice Writer 24.8.
LEFT: Hyphenation across everywhere (hyphenation-keep="auto"). RIGHT: No hyphenation across, resulting shifted hyphenated line (hyphenation-keep="page", which means the default loext:hyphenation-keep="column" for interoperability reasons).
LEFT: Hyphenation across column and page, but not spread (hyphenation-keep="page", loext:hyphenation-keep-type="spread"). Shifted hyphenated line on the first (right-hand) page. RIGHT: same settings, but inserting a page break at the start of the document resulted missing shifting, because the bottom hyphenated line is on the second (left-hand) page.
LEFT: No hyphenation across (hyphenation-keep="auto"). Shifted hyphenated line in the first column of the multi-column page. RIGHT: Hyphenation across a column. No shifted line in the first column (hyphenation-keep=”page", loext:hyphenation-keep-type=”page").
LEFT: Hyphenation across a column. No shifted line in the first column (hyphenation-keep=”page", loext:hyphenation-keep-type=”page"). RIGHT: same settings, but the last hyphenated line shifted in the last column, because that line is the last line of the page, too.
Details of the core and DOCX filter developments, also extending LibreOffice help with the new paragraph settings:
Commit | Description |
tdf#132599 cui offapi sw xmloff: add hyphenation-keep-type
Support XSL attribute "column" and CSS 4 attribute "spread", stored in loext:hyphenation-keep-type, to give better control over hyphenation-keep. E.g. spread: both parts of a hyphenated word shall lie within a single spread, i.e. when the next page is not visible at the same time (e.g. the next page is not a right page of a book).
– css::style::ParaHyphenationKeep is a boolean property now, importing hyphenation-keep = "page" as true.
– type of ParaHyphenationKeep, including the new non-ODF types is stored in the new ParagraphProperties::ParaHyphenationKeepType.
– default value of ParaHyphenationKeepType is COLUMN for interoperability.
– Add checkboxes to Text Flow -> Hyphenation Across in paragraph dialog:
* Column (previously: Hyphenate across column and page) * Page * Spread
– enabling/disabling them follows XSL/CSS 4/loext, i.e. possible combinations:
* No Hyphenation across (hyphenation-keep = "page" and loext:hyphenation-keep-type = "column")
* Hyphenation across [x] Column (hyphenation-keep = "page" and loext:hyphenation-keep-type = "page")
* Hyphenation across [x] Column [x] Page (hyphenation-keep = "page" and loext:hyphenation-keep-type = "spread")
* Hyphenation across [x] Column [x] Page [x] Spread (hyphenation-keep = "auto")
– Add ODF import/export
– Update DOCX import
– Add ODF unit tests
Note: recent implementation depends on widow settings: disabling widow handling allows hyphenation across columns and pages not only in table cells.
Note: RTF import-only, but not used bPageEnd has been renamed to bKeep. Depending on the RTF test results, likely it will need to disable the layout change, e.g. GetKeepType()=ParagraphHyphenationKeepType::AUTO, if PageEnd uses obsolete hyphenation rule, i.e. shifting only the hyphenated word to the next page, not the full line.
More information:
– COLUMN (standard XSL value, defined in https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)
– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last, equivalent of hyphenation-keep, defined in https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).
| |
tdf160518 DOCX: import hyphenation-keep to fix layout
To fix layout interoperability, import DOCX compatSettings allowHyphenationAtTrackBottom and useWord2013TrackBottomHyphenation as hyphenation-keep setting "COLUMN", shifting last hyphenated lines of pages and columns, like MSO does. | |
tdf#132599 add "Hyphenation across" options Document new options of LO 24.8 to control hyphenation in last line of a column, page or spread. |
The code behind “Hyphenation across” has been generalized for all possible page changes, including linked frames, also columns in tables and linked frames.
The DOCX export broke the layout of the documents created in Writer, for example, resulted more pages in MS Word in the case of hyphenated paragraphs. This problem was fixed by adding the missing allowHyphenationAtTrackBottom DOCX compatibility setting.
The following composite screenshots show the DOCX export in MSO (red text), which was 3-page before the fix (top row). After the fix, the result is 2-page in MSO (bottom row), as in Writer (black text). Test document:
Hyphenation was lost, if it was enabled only in “Text body” instead of the default paragraph style. Now Writer exports hyphenation in this case, too, which is more common for documents created in Writer.
CSS 4 „always” was implemented as Hyphenate across → Last full line of paragraph. The hyphenated word of the last full line of the paragraph moves to the last line (if there is enough place for it). This results in longer last lines, and removed hyphenation in the bottom right-hand corner of the paragraph.
LEFT: missing recognition of hyphenate-keep-type="always" in the last paragraph. RIGHT: correct layout: hyphenated word of the last full paragraph was shifted to the last paragraph line.
The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page”, “Spread” and the newest “Last full line of paragraph” on Text Flow pane of the Paragraph settings. The test document and its screenshot show that the hyphenated line was shifted to page 2, according to the Hyphenation across → Page setting:
Now “Hyphenation across” works in tables, too, removing the widow setting dependency of the previous implementation:
LEFT: missing shifting between the split table cell, with hyphenation-keep-type="spread". RIGHT: correct layout.
Linked frames are like columns on the same pages, now with correct layout:
LEFT: bad shifting between the linked frames on a single page, with hyphenation-keep-type="page". RIGHT: correct layout.
With hyphenation-keep="spread", blank left pages weren’t handled correctly, also linked frames anchored only on different right pages.
LEFT: missing shifting between linked frames on right pages, with hyphenation-keep-type="spread". RIGHT: correct layout.
Spread is still recognized with linked frames on left and right pages:
Details of the core and DOCX filter developments, also extending LibreOffice help with the new paragraph settings:
Commit | Description |
tdf#132599 sw schema xmloff: add hyphenation-keep-type='always' Add new hyphenation option to limit hyphenation of the last full line of the hyphenated paragraph. Move also loext:hyphenation-keep-type to paragraph-properties, following the associated hyphenation-keep. Note: value "always" is defined by CSS 4 hyphenate-limit-last, see https://www.w3.org/TR/css-text-4/#hyphenate-line-limits. | |
tdf132599 sw: fix hyphenation-keep for linked frames, also for spreads Linked text frames are hyphenated as columns on the same page, i.e. do not shift the hyphenated line, if hyphenation-keep-type="page" or "spread". For "spread", check also that the hyphenated line is on the previous left page, because checking only right page wasn't enough for linked text frames and blank left pages. | |
tdf#132599 sw: fix hyphenation-keep for tables and no widow Now hyphenation-keep works without widow settings, too, e.g. in tables (where despite the existing widow settings, widow handling is always disabled). | |
tdf#132599 sw: fix test of "fix hyphenation-keep for tables and no widow" The problem was reported by Miklós Vajna. | |
tdf#132599 sw: fix unit tests for hyphenation-keep with frames Fix en_US language of the test documents to be consistent with the hyphenator condition in the related unit tests of commit d4304cd0a4fedd0117fea3625dff1fca2945a0e6 "tdf132599 sw: fix hyphenation-keep for linked frames, also for spreads". The problem was reported by René Engelhard. | |
tdf#160518 sw: fix DOCX import/export of hyphenation-keep – export hyphenation-page="page" setting of native ODF documents, if hyphenation is enabled in the default paragraph or in the text body style with this setting. It's lossless for hyphenation-keep-type="column", while the other values are converted to hyphenation-keep-type="column", which is the default layout of MSO 2013 and later. – fix LO roundtrip of DOCX documents which were created in MSO originally: while the roundtrip kept useWord2013TrackBottomHyphenation and allowHyphenationAtTrackBottom, the exported redundant suppressAutoHyphen = "false" settings of the paragraph resulted broken layout in Writer, because the repeated import overwrote every paragraphs with bad hyphenation setting (hyphenation-keep = "auto" instead of hyphenation-keep = "page"). – export also "Hyphenate CAPS" and "Hyphenation zone" settings, if hyphenation is enabled in text body style with these settings, and not in the default paragraph style. Setting hyphenation only in "Text Body" is more common in documents created in LibreOffice. | |
help: tdf#132599 add "Hyphenation across" -> Last full line of paragraph Document new option of LO 24.8 to control hyphenation in last full line of a paragraph. Fix also the changed IDs of the other "Hyphenation across" options. |
The patches contain several unit tests. The next manual tests list only the most important bug fixes:
1. Open tdf160518_auto_in_default_paragraph_style.fodt (attached to Bug 160518, as “2-page flat ODF” document). The document is 2 pages.
2. Save it in the format “Word 2010–365 Document (.docx)”.
3. Open the result in MS Word: the document is 2-page long, as in Writer. (The old export was 3-page long.)
1. Open tdf132599_always.fodt (attached to Bug 132599, as “flat ODF test document for "Last full line of paragraph"”). The last full line of the last paragraph is not hyphenated. The previously hyphenated word (“celestial”) is shifted to the last line. (This feature wasn’t supported before.)
2. Click on the paragraph settings of the last paragraph, and enable “Last full line of paragraph” in Text Flow → Hyphenate Across. The word “celestial” is hyphenated.
1. Open tdf132599_page_in_table.fodt (attached to Bug 132599 as “test document: In tables, do not hyphenate across spread (3-page document)”). The document is 3-page long. (This was 2-page long because of missing support of Hyphenation across in tables.)
2. Click on the paragraph settings of the paragraph, and enable “Spread” in Text Flow → Hyphenate Across. The document is 2-page long, because the hyphenated line “except that it has an at-” is allowed to be bottom of page 1, which is on the right in its spread.
1. Open tdf132599_frames_on_same_page_hyphenation.fodt (attached to Bug 132599 as “test document: Hyphenation across column in linked frames”). Bottom of the left frame is the hyphenated line “space, ex-”. (Previously this line was shifted to the next frame.)
2. Click on the paragraph settings of the paragraph, and disable “Column” in Text Flow → Hyphenate Across. The hyphenated line “space, ex-” is shifted to the next frame.
1. Open tdf132599_frames_on_right_pages_no_hyphenation.fodt (attached to Bug 132599 as “test document: In linked frames on right pages, do not hyphenate across spread”). The second frame on page 3 starts with the shifted hyphenation line “space, ex-”, according to the disabled Hyphenation across → Spread setting, because the first frame is on page 1 (a different spread). (This was broken before, because Writer didn’t check, that the page before the right page text content is a left page, i.e. on the same spread, or not).
2. Click on the paragraph settings of the paragraph, and enable “Spread” in Text Flow → Hyphenate Across. The hyphenated line “space, ex-” is shifted to the bottom of the first frame.
Hyphenated words got a new context menu item “No Break” to disable their hyphenation using the new “Exclude from hyphenation” character formatting. The context menu item remains available for the words with disabled hyphenation to enable their hyphenation again.
The other usability problem was the incomplete user interface of the new character formatting “Exclude from hyphenation”: it was not possible or very hard to notice the words which removed from hyphenation. Now these words got a light gray dotted underline, when Show Formatting Marks mode is enabled.
New “No Break” context menu of hyphenated words, and light gray dotted underline visualization of words with disabled hyphenation. (Note: no visualization for the previous workaround, the word with language setting “None” in the second paragraph.)
Added a new dispatcher call .uno:NoBreak for the context menus. The menu item “No Break” is visible only, if there is a hyphenated word or a word with No Break formatting under the text cursor (with or without selecting the word). The light gray text formatting is conditional, and not visible in the PDF export, and with disabled Show formatting marks.
Commit | Description |
tdf#161563 tdf#161565 sw: add No Break to word context menu & visualize Add No Break option to context menu of words hyphenated automatically, giving as easy access to fix paragraph layout, as context menu of misspelled words – like DTP software do. Also add this option to context menu of words with enabled "No Break" to disable it. To avoid unwanted paragraph layout during further text editing or formatting, visualize words excluded from hyphenation with a light gray dotted underline, when Formatting Marks is enabled. Follow-up to commit b5e275f47a54bd7fee39dad516a433fde5be872d "tdf#106733 sw: implement CharNoHyphenation" and commit 73bd04a71e741788a2f2f3b26cc46ddb6a361372 "tdf#106733 xmloff: keep fo:hyphenate in character formatting". | |
tdf#161563 sw: show "No Break" context menu only on a whole word It's possible to set CharNoHyphenation on shorter character sequences, than a word, but the result is not correct (use soft hyphens for alternative hyphenation within words), so limit "No Break" menu item only for selected words. (Not completely, because only Point() is checked for word boundary yet, not also Mark().) If no selection, cursor position must be within the hyphenated word (where "No Break" applied for the whole word automatically). This fixes also the assert in SwTextFrame::IsInHyphenatedWord(), when multiple nodes were selected. | |
tdf#161563 sw: fix invisible light gray underline for No Break Light gray underline visualization depended on IsShowHiddenChar() instead of the correct IsViewMetaChar() (Show Formatting Marks). |
1.Open the test documents tdf106733.fodt or tdf106733_LinuxLibertineDisplayG.fodt (attached to Bug 106733). The words with enabled “Exclude from hyphenation” got a light gray underline.
2.Click on the Show Formatting Mark (paragraph mark) icon to disable and enable the light gray underline.
3.Open the context menu of the hyphenated word in the first paragraph. Choose the first item “No Break” to disable its hyphenation. The word is not hyphenated any more and got a light gray underline.
4.Open the context menu of the word with the light gray underline, and choose No Break again. The word is hyphenated again, and no more light gray underline.
Value “Maximum consecutive hyphenated lines” wasn’t imported from DOCX files, and the associated ParaHyphenationMaxHyphens wasn’t exported to the OOXML document setting consecutiveHyphenLimit, losing layout interoperability.
Note: OpenDocument interoperability is possible here, false information on page 61 in Eckert et al.: Document Interoperability – Open Document Format and Office Open XML, Fraunhofer Verlag, 2009.
As a regression, smart justify, i.e. space shrinking resulted overshrank lines, i.e. lines with removed spaces and overlapping words when the line hyphenated only in the first call of SwTextGuess::Guess(). (First call calculates the available spaces, the second call makes the final line break.):
After the fix, skipping hyphenation completely:
The problem was solved with the reiinstantiation of the SwTextGuess object for the optional second call.
Note: also Caolán McNamara (Collabora Productivity) made a SwTextGuess fix related to a compiler-based code analysis, which was an alternative solution for the reported test document.
Default hyphenation zone is not zero in OOXML, but ¼ inch, according to the standard (see w:hyphenationZone, ECMA–376 – Offixe Open XML 1st Edition). Because DOCX export of Writer didn’t contain its default zero hyphenation zone, MSO imported the document with a non-zero hyphenation zone, potentially losing the text layout. For example, with normal 11 pt font size, hyphenations “al-legory” or “fi-nal” are disabled with ¼ inch hyphenation zone.
As a continuation of the implementation of the hyphenation zone in Bug 149421, the default hyphenation zone ¼ inch was added to the DOCX import. Also the zero hyphenation zone is always exported, i.e. in case of documents created in Writer, solving the text layout difference.
Note: it seems, MSO doesn’t follow its own standard, because it uses unknown default values in some languages, e.g. bigger as ¼ inch, depending on the language of the operating system. For example, the LibreOffice_tracked-changes_bug.docx of Bug 161628 got 425 twips (~0,75 cm) from Office 365 instead of the standardized 360 twips (¼ inch) on a Hungarian operating system. Microsoft “Open specification” mentions the difference, but without the details: https://learn.microsoft.com/en-us/openspecs/office_standards/ms-oe376/660d0f16-dffb-48ea-a25d-7210fb2f2a7a.
Test document hyphenation_zone.docx (Bug 149421) shows different line break (közvet=lenül/közvetle=nül), while the hyphenated word “közvetlenül” has the same possible hyphenation points in Writer and MSO (köz=vet=le=nül):
Composite image: red – MSO, black – Writer
Disabling hyphenation zone (in MSO, setting 0.01 cm) didn’t modify the hyphenation, so the difference is not related to the hyphenation zone. There is no justification and smart justify here (the test document is created in MSO 2010, so it doesn’t use smart justify for justified lines). Choosing sub hyphenation seems to be a bug in MSO, and it needs more investigation.
Commit | Description |
tdf#161643 sw DOCX import/export of maximum consecutive hyphenated lines Fix line break interoperability by importing w:consecutiveHyphenLimit to ParaHyphenationMaxHyphens, and exporting ParaHyphenationMacHyphens to w:consecutiveHyphenLimit in OOXML import/export filters. | |
tdf#160170 sw: fix overshrunk justified lines at hyphenation Smart justify uses 2 SwTextGuess::Guess() calls to break a line, but using the same SwTextGuess object resulted overshrunk lines, if the first call resulted hyphenation, because of the bad state of the object for the second call. If we need a second call, now instantiate a new object for it. Regression from commit 36bfc86e27fa03ee16f87819549ab126c5a68cac "tdf#119908 tdf#158776 sw smart justify: shrink only spaces". Note: the reported test document was already fixed by commit f050103c3324d878b310f37429ea3580a8230905 "stale hyphenation data after skipping blanks". | |
tdf#160170 sw: test for fix overshrank lines with hyphenation Follow-up to commit ca540209a8c20a2734f180d4706d5153bdf64523 "tdf#160170 sw: fix overshrunk justified lines at hyphenation". | |
tdf#161628 DOCX import: set default hyphenation zone (1/4 inch) Default value of hyphenationZone is 360 twips (0.25"). Apply this value, if hyphenationZone is not defined, according to the OOXML standard. Follow-up to commit 5a079652c1b1f968a851f47995b0a65b84d2d192 "tdf#149421 DOCX: import/export hyphenation zone". | |
tdf#161628 sw DOCX: export zero hyphenation zone, if it's not defined To keep the layout of the document, export zero hyphenation zone instead of nothing, otherwise it would be 360 twips after importing the document with the default hyphenation zone. |
1.Open 2007351228.docx (test document of Bug 76163). Check its paragraph setting Maximum consecutive hyphenated lines on Text Flow page in Format → Paragraph → Paragraph… dialog window. Value of the setting is 1 (not zero).
2.Save the document in a different place in DOCX format, and reload it. The value is still 1.
1.Open the test document tdf160170.fodt (Bug 160170), and check the first line: it contains spaces.
Create a new document with hyphenation, and with zero hyphenation zone (default in Writer):
1.Put the cursor in a paragraph in Default Paragraph Style.
2.Choose Format→Paragraph… Text Flow, and enable Hyphenation. The default hyphenation zone is zero.
3.Export the document in DOCX 2010–365 format.
4.Reload it, and check hyphenation zone: it is still zero.
5.Load the exported document in MSO: search hyphenation settings in the search bar, and check hyphenation zone: it is zero (it was 360, 425 etc. before fixing the export).
1.Open the test document LibreOffice_tracked-changes_bug.docx (Bug 161628), which does’t contain w:hyphenationZone definition.
2.Check the value of the hyphenation zone in Format→Paragraph… Text Flow: it’s ¼ inch (0,63 cm), not zero, like in MSO.
On the attached screenshot, bottom of the sidebar paragraph panel shows the disabled paragraph-level hyphenation controls with the single visible Hyphenation toggle button to enable automatic paragraph hyphenation.
Middle of the sidebar, Character Panel shows two new icons. The first is the No Break icon, which is sensible on words hyphenated by the automatic paragraph-level-hyphenation. The second new icon is the Insert Soft Hyphen icon, which opens the soft hyphen based Hyphenation dialog window, allowing manual insertion and adjustment of the hyphenation break points, replacing or overwriting the result of the automatic paragraph-level hyphenation. (This can be useful for languages without good hyphenation patterns, i.e. previously for German, when its hyphenation dictionary hadn’t used libhyphen’s compound word based functionality, yet.)
Extending the user interface of LibreOffice is not an easy thing, because of the complexity and lack of the documentation of the code base. For example the commit description mentions, that to create a visible icon from the previously added .uno:NoBreak dispatcher call, in one of the resource files, value 8 must be changed to value 9 (a bitset in the XML), too.
Also the first approach (attached to the issue in the bug tracker), i.e. adding new UNO calls for all controls, didn’t work correctly because of a similar undocumented setting. The committed patch is more compact, but it’s possible, that it uses only resized versions of the 16×16 pixel icons instead of the 24×24 and 32×32 icons on high resolution displays.
Note: it’s possible to add new icons for arbitrary functions without UNO calls, using the links.txt files of icon-themes, but instead of designing new icons based on the SVG source files of the existing icons, existing icons were reused to avoid of designing 3-3 icon sizes for all the half dozen icon themes.
Commit | Description |
tdf#162491 tdf#125032 add hyphenation settings to sidebar Add .uno:NoBreak to the Character sidebar panel to disable automatic hyphenation for selected words. The icon is enabled only on hyphenated words and words with disabled hyphenation. Add .uno:Hyphenate icon to the Character sidebar panel, with tooltip “Insert Soft Hyphen...”, which opens the dialog for (semi-)automatic insertion of soft hyphens. Add paragraph-level hyphenation settings to the Paragraph sidebar panel. Only the toggle button icon "Hyphenation" is visible to enable hyphenation, if the paragraph is not hyphenated. If it's enabled, show all paragraph-level settings. These new sidebar controls allow adjusting paragraph layout and hyphenation quickly, like DTP software do. Note: to add icon to .uno:NoBreak, modify "Properties" of officecfg/registry/data/org/openoffice/Office/UI/WriterCommands.xcu. Note: it's possible, that high resolution icon sizes will need extra dispatcher calls (the draft is attached to the issue in the bug tracker). |
Daily builds of LibreOffice allows to check the new sidebar functions.
1.In Writer, enable the sidebar, if needed, with Ctrl+F5 or View→Sidebar, and
2.put the text cursor in a paragraph with working hyphenation (it depends on language of the document and the installed hyphenation patterns).
3.Click on the Hyphenation toggle button at the right bottom corner to enable and disable the hyphenation of the paragraph.
4.Disable toggle button Hyphenate Last Full Paragraph Line to remove hyphenation in the last full line of the paragraph (if it’s possible without increasing the lines of the paragraph).
5.Increase the value of the spinbox At line end to remove some of the remaining hyphenations in the paragraph.
6.Setting or updating hyphenation setting for the paragraph style, click on the Update Selected Style icon on the Style sidebar panel or press Ctrl+Shift+F11.
To avoid of the crowded sidebar toolbar with several Hyphenation Across icons, it’s possible to compress the four icons into a popup icon menu.
Also hiding the No Break icon may be better, than graying out.
It’s possible to extend Hyphenate CAPS with Hyphenate Caps, i.e. avoid hyphenation of all the words started with a capital letter (like DTP software do).
Also No Break has more functionality in DTP software, than disabling hyphenation: applying on multiple words, it disables also the line break at spaces, i.e. between words, too.
It’s recommended to enhance the soft hyphen based hyphenation dialog window to allow removing soft hyphens (like DTP or XSL-FO software offer similar functionality).
Inline heading is a DOCX feature, using so called style separators to hide paragraph mark of the heading, resulting a paragraph with multiple parts formatted with different paragraph styles. According to Bruce Hatfield in Bug 131728, “Word style separators are perhaps the key feature lacking in Libreoffice that prevents widespread Law office/legislator usage of LO Writer.” He translates “patents and other documents for US Federal court (+ other jurisdictions) and first came aware of the use of style separators to minimize navigation pane content to essential content (instructions from judge’s clerk). Basically it means only the essential part of a heading can now be indexed in the navigation pane/pdf bookmarks (converted from word).”
The following screenshot shows the first result of the interoperability development, lost (left black text) and fixed (right black text) import of DOCX inline headings.
Composite screenshots showing the lost (left black text) and fixed (right black text) import of DOCX documents with inline heading. (Red color: MSO, black: LibreOffice Writer.) The remaining difference is not related to inline heading, only to the different paragraph spacing. (Click on the image to show more details.)
The import uses inline text frames to 1) keep the original paragraph with its heading style, keeping also the Table of Content and PDF bookmark support 2) but allowing to put it in the same line of a normal paragraph.
Commit | Description |
tdf#131728 sw inline heading: fix DOCX paragraph layout interoperability Fix layout of paragraph – which contains two paragraph styles – by importing OOXML style separator using a text frame. Inline headings – where there is no paragraph break after the heading, i.e. it's followed by the normal paragraph content – specified by w:specVanish in OOXML, i.e. a special paragraph with hidden paragraph mark. These headings were loaded as normal, separated paragraphs, breaking the paragraph layout with their paragraph breaks. Map inline headings to inline ODF text frames to keep paragraph layout. The frame contains the original inline paragraph, still keeping ODF ToC and PDF bookmark support. |
Daily builds of LibreOffice allows to check the fixed DOCX import.
1.In Writer, open the test document attached to Bug 131728. The inline headings are not in separated paragraphs, but they are in the same line, as their continuations (see on the right side of the previous screenshot).
Releasing LibreOffice with the new space shrinking interoperability algorithm resulted new bug reports for specific circumstances where implementation has not worked or has worked poorly. All reported bugs were fixed.
The special circumstances were:
–missing shrinking in the last line of paragraphs, resulting overhanging line;
–bad cursor and pilcrow position, bad selection in the last line with a single text portion;
–tabulator in the line, resulting overhanging line;
–very narrow text portion at the end of the line, resulting unwanted line and word break.
The list of the developments:
Commit | Description |
tdf#162109 sw smart justify: fix overhanging last line Last line of justified paragraphs is excluded from justification normally, but not in the case, where it fits only with shrinking spaces. This line was overhanging because of the missing justification and space shrinking. | |
tdf#162220 sw smart justify: fix shrinking for single portion lines Follow-up to commit 6b857398a59d16308d6185d01e003e401439f060 "tdf#162109 sw smart justify: fix overhanging last line". | |
tdf#162109 tdf#162220 sw smart justify: add unit tests Follow-up to commit 6b857398a59d16308d6185d01e003e401439f060 "tdf#162109 sw smart justify: fix overhanging last line" and commit 22eac3145ca62d15b47d95f4df60ce38d4f5aa46 "tdf#162220 sw smart justify: fix shrinking for single portion lines". | |
tdf#161810 sw smart justify: fix overhanging lines containing tabs Length of tabulator portions wasn't taken into account during calculating overhanging lines, resulting missing space shrinking. | |
tdf#163042 sw smart justify: fix cursor of single portion lines Fixed problems with cursor/pilcrow positions in a shrunk last line of a paragraph with a single portion:
| |
tdf#163060 sw smart justify: fix unwanted line break inside words End-of-line narrow line portion was broken into the following line, despite that it was inside a word, if the remaining free space for the line portion was negative in the line (GetLineWidth()), because of missing calculation with the extra available line width resulted by space shrinking. |
Daily builds of LibreOffice allows to check the improvements:
1.In Writer, open the test document hanging_punctuationx.fodt of Bug 162109. The last (second) line of the paragraph contain shrunk spaces, i.e. there is no overhanging line.
2.Open the test document test240726.docx of Bug 162220. The last line of the paragraph contain shrunk spaces, i.e. there is no overhanging line.
3.Open the test document Answers On A Postcard.docx of Bug 161810. The first lines of the paragraphs contain correctly shrunk spaces. i.e. there are no overhanging lines.
4.Open the test document bad_cursor_and_pilcrow_positions.fodt of Bug 163042. 1) Clicking before the last or the last but one characters of the line, the cursor is positioned there, not at the end of the line. 2) Enabling Formatting Marks, the paragraph marks is there at the end of the line, not inside the line. 3) When the text cursor is there at the end of the line, the visible cursor position is there, too, i.e. typing or deleting the text by backspace, the visible cursor position follows the real cursor position.
5.Open the test document test.docx of Bug 163060. There is no broken line between the word “pumping” (it was broken after its first letter).
Adding DOCX export to the previous DOCX import/layout support solved the serious interoparibility issue of the previous LibreOffice versions. Now inline headings stay inline instead of converting them separated paragraphs during the DOCX round-trip.
DOCX round-trip of inline headings results in a document with the original OOXML style separators.
Commit | Description |
tdf#131728 sw inline heading: fix missing/broken DOCX export Fix layout interoperability during DOCX round-trip by grab-bagging w:p/w:pPr/w:rPr/w:specVanish, i.e. the style separators. Note: use FrameInteropGrabBag to select the text frames, which are inline headings, exporting only their text content (a single paragraph), and use also ParaInteropGrabBag to export w:specVanish. Note: specVanish lost completely originally, converting inline headings to normal paragraphs. After commit 56588663a0fddc005c12afaa7d3f8874d036875f, text frames (the workaround for inline heading/ToC/bookmark support) were exported instead of plain paragraphs, which were broken at least in LibreOffice. Follow-up to commit 56588663a0fddc005c12afaa7d3f8874d036875f "tdf#131728 sw inline heading: fix DOCX paragraph layout interoperability". |
Daily builds of LibreOffice allows to check the fixed DOCX import.
1.In Writer, open the test document attached to Bug 131728. The inline headings are not in separated paragraphs, but they are in the same line, as their continuations (see on the right side of the previous screenshot).
2.Save the test document as e.g. new_export.docx, and use File→Reload to reload the document.
3.Inline headings are still inline headings, not separated paragraphs or lost in a broken text layout.
Inline headings are part of APA Style, IEEE-, MIL-STD-961E-format, US Federal court style guides, other technical and legal formatting standards, and with these developments, LibreOffice got initial UX support to create inline heading: applying a style on the first selected words or sentence of a paragraph results in an inline heading.
For example, APA Style defines Level 4 heading as
“Indented, Bold, Title Case Heading, Ending With a Period. Text begins on the same line and continues as a regular paragraph.” (https://apastyle.apa.org/style-grammar-guidelines/paper-format/headings).
A triple-click on the beginning of the paragraph selects the first sentence, and choosing Heading 4 from the Formatting Toolbar (or pressing Ctrl-4) converts the selected sentence to an inline heading with Heading 4 style:
1. Select the beginning of a paragraph and apply a heading style.
2. The result is an inline heading with the selected heading style.
The UX extension added support to use inline headings in new ODT files.
Commit | Description |
tdf#48459 sw inline heading: apply it on the selected words Selected text at the beginning of a paragraph (<= 75 characters) become text frame based inline heading at applying a paragraph style (using Formatting toolbar, context menu, Ctrl-1...Ctrl-5 or Styles sidebar panel). If the whole paragraph is selected, or if no or multiple paragraphs are selected, formatting is still applied on the whole paragraphs. Using text frames for inline heading is ODF 1.0 compliant and fully back-compatible with the older Writer versions. The new inline heading frame contains direct formatting to zero the upper and bottom paragraph margin to solve interoperability issues: in MSO, margins of heading styles are zeroed by using the style separators. Note: lack of inline heading was a showstopper for creating APA-, IEEE-, MIL-STD-961E-format, legal etc. documents. Note: recent Formula frame style will be replaced by the planned Inline Heading, which will be used by the DOCX filter to export OOXML style separators instead of text frames. |
Daily builds of LibreOffice allows to check the enhanced Writer user interface.
1.In Writer, write “lorem” and press F3 to insert a sample text in a new empty document.
2.Select the first word of one of the paragraph e.g. by double-click on it.
3.Press Ctrl-4 or choose Heading 4 in the Set Paragraph Style drop-down list on the Formatting toolbar to convert the selected text to an inline heading.
4.Click on it to put the cursor inside the inline heading text frame to show that the inline heading is formatted with the selected heading style. Press Ctrl-F10 or click on the Toggle Formatting Mark icon on the Standard toolbar (also enable View→Text Boundaries, if needed) to show the (invisible) border of the inline heading text frame.
5.Press Escape to select the text frame. Enable Styles sidebar panel by pressing F11 or View→Styles to show the style of the text frame (recently “Formula”).
Possible continuation of the inline heading development contains several issues:
1.Add DOCX export of the newly created inline headings, e.g. by creating and using a default Inline Heading frame style;
2.Add HTML export to support web publishing in APA Style, and in other formatting standards;
3.Fix ordering of the PDF bookmarks (where frame content exported separately from the headings;
4.Add Navigator support to modify order of inline headings with their body text;
5.Fix intendation issues;
6.Support inline heading conversion of multiselection, e.g. after selecting first word or sentence of multiple paragraphs by a regular expression based text search;
7.Extend OpenDocument/UNO to support style-level inline heading (keeping also back-compatibility).
Releasing LibreOffice with the new space shrinking interoperability algorithm resulted new bug reports for specific circumstances where implementation has not worked or has worked poorly. All reported bugs were fixed.
The special circumstances were:
–overhanging lines at wrapping text around images;
–and after fixing this issue, a regression with documents containing also right-to-left writing.
The list of the developments:
Commit | Description |
tdf#163149 sw smart justify: fix line shrinking at image wrapping Limited line width at image wrap could result negative nSpaceAdd value, i.e. paragraph line with extra letter spacing instead of line shrinking. Regression from commit 17eaebee279772b6062ae3448012133897fc71bb "tdf#119908 sw smart justify: fix justification by shrinking". | |
tdf#163575 sw smart justify: fix size resolution for SwBidiPortion Negative space sizes (i.e. shrunk lines at image wrapping) stored over LONG_MAX/2, and these values had no resolution in SwBidiPortion, causing crash/assert in conversion of DOCX document containing e.g. Arabic text wrapping around images. Note: apply the resolution in SwDoubleLinePortion, too. Regression since commit 1fb6de02709a5f420f21ebd683915da50ce0d198 "tdf#163149 sw smart justify: fix line shrinking at image wrapping". |
Daily builds of LibreOffice allows to check the improvements:
1.In Writer, open the test document Image-square-tight-wrap_C15.docx of Bug 143934. The document doesn’t contain overhanging lines (which problems are visible on the old PDF export attached to Bug 163149).
2.In Writer, open the test document forum-mso-en-6704.docx of Bug 163575. Save it in .odt format. No assert/crash occurs.
This project was funded through the NGI0 Entrust Fund, a fund established by NLnet Foundation with financial support from the European Commission's Next Generation Internet programme. More information: https://nlnet.nl/project/LO-Typography/