LibreOffice/Collabora Online Typography
Line break interoperability and state-of-the-art ISO OpenDocument/web typography in open source office suites
LibreOffice, the open source office suite is the reference implementation of ISO OpenDocument (ODF) format. As part of LibreOffice Technology, Collabora Online is key for the digital sovereignty of open source online document editing. Metrically equivalent fonts cannot guarantee MS Word-interoperability any more because of the undocumented changes in MS Word line break algorithm after ODF and OOXML standardization. The planned project solves this fundamental problem, adding also the the following interoperability and CSS4 web typography features and fixes to LibreOffice line break algorithm and layout:
•.tdf#119908 Fix line break interoperability of LibreOffice and Collabora Online in the form of a new default paragraph layout, which is equivalent of the new and undocumented algorithm of MS Word 2013 and later;
•.tdf#132599 Implement ODF attribute fo:hyphenation-keep, add documented interoperability and state-of-the-art CSS4 web typography feature “stop words hyphenating across pages” to LibreOffice and Collabora Online;
•.tdf#106733 Implement ODF attribute fo:hyphenate to exclude a portion of text from hyphenation;
•.tdf#149421 Fix non-implemented details of the hyphenation zone interoperability.
All the developments are libre and open source, and will be integrated with the main development branch of LibreOffice, and released with its next stable release.
Layout difference of MS Word (upper) and LibreOffice (bottom): line break in justified paragraphs supports shrinking of spaces between words since MS Word 2013. Only disabling the justification results in the same layout, moving the word “Antarctica” to the second line in MS Word, too. The bottom paragraphs contain hundreds of adjacent spaces shrinking only in MS Word.
With 0.1 pt resolution, up to 2% shrinking was measured for plain justified lines (no direct character formatting, no hyphenation). The following picture shows, how the shrinking is implemented in MSO by using the available spaces (but the maximal shrinking is independent from the number of the spaces – at least in a line with enough spaces):
(Click on the image to show more details. Black text: MS Word, red text: LibreOffice Writer – the black text contains an extra character spacing after the text “3 Au” to give the same position after the last space).
Commit 7d08767b890e723cd502b1c61d250924f695eb98 “tdf#130088 tdf#119908 smart justify: fix DOCX line count + compat opt.” is the fix for the line count/page count differences and the initial fix for the new justification:
•.It recognizes the DOCX documents with the new default MSO 2013+ justification;
•.fix line/page count in import of DOCX documents with MSO 2013+ justification: break lines at same positions in plain text, resulting rendering consistency (no more lines and pages) in LibreOffice/Collabora Online importing DOCX files with the new default justification, but with exceeding lines yet;
•.add new LibreOffice compatibility option “JustifyLinesWithShrinking” to enable line break interoperability for DOCX documents, and store this setting in ISO OpenDocument, native document format of LibreOffice, too;
•.and add unit test for these.
(MSO, Writer before the fix, Writer after the fix. Click on the image to show more details. Blue marks: correct line break positions in MSO and improved Writer, red marks: bad line breaks in Writer before the fix).
Shrinking has been added by commit 17eaebee279772b6062ae3448012133897fc71bb “tdf#119908 sw smart justify: fix justification by shrinking” and commit c1803de8a093739d189be54b2d9bd5634e9e79ee “tdf#119908 sw smart justify: add unit test”, fixing the temporary exceeding lines, i.e. justifying them.
The following composite picture shows the previous (red), and the shrank (black) lines in Writer. With this commit, the result is the same, as in MS Word. (Click on the image to show more details.)
The following composite screenshots of MS Word (red) and Writer (black) show the fix of handling multiple text portions (middle), and the shrinking algorithm (right), which resulted the same line breaks, as in MS Word (test document: lorempage.docx of Bug 158333, generated PDFs: Word, Writer). (Click on the image to show more details.)
The related commits in LibreOffice code base:
Commit | Description |
tdf#158333 sw smart justify: fix multiple text portions Multiple text portions, e.g. if some part of a line contains direct character formatting breaks DOCX interoperability of justified paragraphs. | |
tdf#119908 tdf#158419 sw smart justify: fix cursor position Text cursor didn't follow the new word positions yet, because of unsigned casting of the negative shrinking value. | |
tdf#119908 tdf#158436 sw smart justify: fix freezing with NBSP Stop shrinking during underflow, because it resulted endless layout loop, e.g. when a very short word followed by a no-break space. The problem reported by Miklós Vajna (Collabora Productivity). | |
tdf#119908 tdf#158776 sw smart justify: shrink only spaces For interoperability, only shrink spaces up to 20%, not the lines up to 2%. | |
tdf#159102 sw smart justify: fix automatic hyphenation As before with soft hyphens, automatic hyphenation could result too much shrinking, because of calculating with an extra non-existing space in the line. Also try to shrink the line only if a space likely will be available in it. |
(Note: commit 93ab1bc6be7b46226874810ec4fc3c61d5d0fc7c reverted the removal of unit test testOfz64109, caused by the second commit by accident. Reported by Miklós Vajna.)
Text portions (spans or runs) are associated to direct character formatting of the paragraphs, or simply resulted by editing the text without formatting difference.
The first implementation calculated only with the last text portion of the line, which didn’t cause problem, if there was only a single text portion, otherwise resulted different line breaks (less or missing shrinking). The first screenshots show, that the test file contained multiple bad line break resulted by the multiple text portions in the lines. These three bad line breaks were fixed by the first commit.
Selected text and the text cursor were in wrong position, over the exceeding, i.e. not shrank line instead of the visible shrank line. The second commit adjusted cursor movement and text selection to the shrank lines.
Extending text layout code to shrinking resulted an infinite loop, e.g. when a paragraph ended with a no-break space initiated the recalculation of the line with underflow. This was solved by disabling shrinking for underflow.
Removing spaces instead of adding them/replacing them allowed more precise comparison of MSO and LibreOffice, see the following data measured with the previous test file, which show up to 20% shrinking of spaces instead of up to 2% shrinking of the lines.
paragraph width | max spaces in line | space width |
|
|
|
347.53 pt | 104 | 3.34 pt |
|
|
|
|
|
|
|
|
|
| allowed extra character spacing in the line (pt) | extra spacing by shrinking | shrinking | ||
spaces | MS Word | Writer | difference (pt) | line | spaces |
6 | 5.98 | 1.95 | 4.03 | 1.2% | 20.1% |
5 | 8.08 | 4.65 | 3.43 | 1.0% | 20.5% |
4 | 10.18 | 7.45 | 2.73 | 0.8% | 20.4% |
3 | 12.23 | 10.25 | 1.98 | 0.6% | 19.8% |
2 | 14.33 | 13.05 | 1.28 | 0.4% | 19.2% |
1 | 16.43 | 15.75 | 0.68 | 0.2% | 20.3% |
|
|
|
|
|
|
Low precision (approximated recalculation of multiple text portions?) | |||||
9 | 11.15 | 4.15 | 7 | 2.0% | 23.3% |
12 | 9.63 | 1.35 | 8.28 | 2.4% | 20.6% |
21 | 16.83 | 5.85 | 10.98 | 3.2% | 15.6% |
The last commit changed the algorithm to this, resulting the same line break not only for the previous test file, but for the originally reported test document of tdf#119908: (Click on the image to show more details.)
Hyphenation was only a paragraph-level feature in LibreOffice, while the OpenDocument standard allows to disable hyphenation in character formatting, too. The related commits in LibreOffice code base, which solved the problem, allowing to disable words from hyphenation.
Commit | Description |
tdf#106733 xmloff: keep fo:hyphenate in character formatting In the case of character formatting, map fo:hyphenate to the unused CharNoHyphenation character property to keep it during ODF import/export instead of losing it completely. This is the first step to disable hyphenation for single words or text spans in paragraphs with automatic hyphenation. Note: using fo:hyphenate as character property is part of the ODF standard. Note: the old workaround to disable hyphenation, changing the language of the text to None had got some serious fallback: losing spell checking and losing language-dependent text layout (supported by both OpenType and Graphite font engines in LibreOffice). | |
tdf#106733 sw: implement CharNoHyphenation Implement CharNoHyphenation character property to disable automatic hyphenation of words in paragraphs with enabled hyphenation. Fix also fo:hyphenate mapping to CharNoHyphenation using automatic inversion of their boolean values defined by xmloff's XML_TYPE_NBOOL, as suggested by Michael Stahl. | |
tdf#106733 sw: fix bad downcast in SwTextNode::GetLang() Fix bad cast of SvxNoHyphenItem to SvxLanguageItem reported by <https://ci.libreoffice.org/job/lo_ubsan/3049/>. | |
tdf#106733 sw cui: add CharNoHyphenation checkbox On Position tab of Character formatting dialog window as a new checkbox "Exclude from hyphenation" (UX design by Heiko Tietze). With this, it's possible to disable hyphenation with direct character formatting (e.g. combined with Find All), or using character styles, and setting “Exclude from hyphenation” in them. This feature is conformant to the OpenDocument standard, and unlike the previous locale=None workaround, it keeps spell checking and locale dependent text layout. |
(Note: commit 1b83ebf42c535528b73baac2407b347f19070d07 disabled the unit test of tdf#159102 temporarily, because of lack of hyphenation on some test builds. Reported by Noel Grandin and Miklós Vajna.)
The following screenshot shows the new option in Character formatting dialog window:
This development adds new typographical features standardized by OpenDocument, XSL and CSS 4 to LibreOffice Writer to guarantee MSO interoperability and more: following typographical rules and web standards.
Not only justified text, but left-aligned etc. texts have lost their interoperability since MSO 2013, depending on hyphenation. New default page layout algorithms of MSO truncates the last hyphenated word (before MSO 2013) or line (MSO 2013 and newer), following the English typographical traditions and rules.
Also OpenDocument standard contains the same feature, fo:hyphenation-keep, but this wasn’t implemented before, also LibreOffice hasn’t kept this setting during import/export of an ODF document.
For more control on hyphenation, also better conformance with standard typographical rules, LibreOffice Writer implements “hyphenation across” values of XSL and CSS 4 which haven’t been covered by OpenDocument, yet (loext:hyphenation-keep-type).
As a new development, LibreOffice not only keeps the value of fo:hyphenation-keep, but the page layout moves the last hyphenated line of the page (or column) to the next page (or column), similar to MSO 2016 and newer.
Commit | Description |
tdf#132599 cui offapi sw xmloff: implement hyphenate-keep Both parts of a hyphenated word shall lie within a single page with ODF paragraph setting fo:hyphenation-keep="page". The implementation follows the default page layout of MSO 2016 and newer by shifting the bottom hyphenated line to the next page (and to the next column). Note: this is a MSO DOCX interoperability feature, used also in DTP software, XSL and CSS.
* Add checkbox/combobox to Text Flow in paragraph dialog * Store property in paragraph model (com::sun::star::style::ParagraphProperties::ParaHyphenationKeep) * Add ODF import/export * Add ODF unit tests
New constants of com::sun::star::text::ParagraphHyphenationKeepType, containing ODF AUTO and PAGE (borrowed from XSL), and for the planned extension ParaHyphenationKeepType of ParagraphProperties:
– COLUMN (standard XSL value, defined in https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)
– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last, equivalent of hyphenation-keep, defined in https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).
Note: the implementation truncates only a single hyphenated line, like MSO does: the pages can end in hyphenated lines (i.e. in the case of consecutive hyphenated lines), but less often, than before.
Clean-up hyphenation dialog by collecting "Don't hyphenate" options at the end of the hyphenation settings, and negating them (similar to MSO and DTP), adding also the new option "Hyphenate across column and page":
[x] Hyphenate words in CAPS [x] Hyphenate last word [x] Hyphenate across column and page
Note: ODF fo:hyphenation-keep has got only "auto" and "page" attributes, while XSL defines also "column". Because of the interoperability with MSO and DTP, fo:hyphenation-keep="page" is interpreted as XSL "column", avoiding hyphenation at the end of column, not only at the end of page. |
The following screenshot shows the new “Hyphenate across column and page” option on “Text Flow” pane of Paragraph formatting dialog window: the last hyphenated line “except that it has at-” was shifted to the next page. Also the clean-up of “Hyphenate words in CAPS/last word” options is visible.
Recent OpenDocument standard has not fully adopted values of the XSL attribute hyphenation-keep: value “column” is missing (https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep). Also according e.g. to New Hart Rules for English (OUP, 2005, “Line Endings”, p. 40, cited by R. Green in tdf#132599), it’s a typographical requirement to avoid hyphenation in the last line of a spread, i.e. a visible page pair. That is why CSS 4 defines “spread”, not only “page” and “column” to control hyphenation.
The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page” and “Spread” on Text Flow pane of the Paragraph settings:
Following screenshots show layout of the test documents in LibreOffice Writer 24.8.
LEFT: Hyphenation across everywhere (hyphenation-keep="auto"). RIGHT: No hyphenation across, resulting shifted hyphenated line (hyphenation-keep="page", which means the default loext:hyphenation-keep="column" for interoperability reasons).
LEFT: Hyphenation across column and page, but not spread (hyphenation-keep="page", loext:hyphenation-keep-type="spread"). Shifted hyphenated line on the first (right-hand) page. RIGHT: same settings, but inserting a page break at the start of the document resulted missing shifting, because the bottom hyphenated line is on the second (left-hand) page.
LEFT: No hyphenation across (hyphenation-keep="auto"). Shifted hyphenated line in the first column of the multi-column page. RIGHT: Hyphenation across a column. No shifted line in the first column (hyphenation-keep=”page", loext:hyphenation-keep-type=”page").
LEFT: Hyphenation across a column. No shifted line in the first column (hyphenation-keep=”page", loext:hyphenation-keep-type=”page"). RIGHT: same settings, but the last hyphenated line shifted in the last column, because that line is the last line of the page, too.
Details of the core and DOCX filter developments, also extending LibreOffice help with the new paragraph settings:
Commit | Description |
tdf#132599 cui offapi sw xmloff: add hyphenation-keep-type
Support XSL attribute "column" and CSS 4 attribute "spread", stored in loext:hyphenation-keep-type, to give better control over hyphenation-keep. E.g. spread: both parts of a hyphenated word shall lie within a single spread, i.e. when the next page is not visible at the same time (e.g. the next page is not a right page of a book).
– css::style::ParaHyphenationKeep is a boolean property now, importing hyphenation-keep = "page" as true.
– type of ParaHyphenationKeep, including the new non-ODF types is stored in the new ParagraphProperties::ParaHyphenationKeepType.
– default value of ParaHyphenationKeepType is COLUMN for interoperability.
– Add checkboxes to Text Flow -> Hyphenation Across in paragraph dialog:
* Column (previously: Hyphenate across column and page) * Page * Spread
– enabling/disabling them follows XSL/CSS 4/loext, i.e. possible combinations:
* No Hyphenation across (hyphenation-keep = "page" and loext:hyphenation-keep-type = "column")
* Hyphenation across [x] Column (hyphenation-keep = "page" and loext:hyphenation-keep-type = "page")
* Hyphenation across [x] Column [x] Page (hyphenation-keep = "page" and loext:hyphenation-keep-type = "spread")
* Hyphenation across [x] Column [x] Page [x] Spread (hyphenation-keep = "auto")
– Add ODF import/export
– Update DOCX import
– Add ODF unit tests
Note: recent implementation depends on widow settings: disabling widow handling allows hyphenation across columns and pages not only in table cells.
Note: RTF import-only, but not used bPageEnd has been renamed to bKeep. Depending on the RTF test results, likely it will need to disable the layout change, e.g. GetKeepType()=ParagraphHyphenationKeepType::AUTO, if PageEnd uses obsolete hyphenation rule, i.e. shifting only the hyphenated word to the next page, not the full line.
More information:
– COLUMN (standard XSL value, defined in https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)
– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last, equivalent of hyphenation-keep, defined in https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).
| |
tdf160518 DOCX: import hyphenation-keep to fix layout
To fix layout interoperability, import DOCX compatSettings allowHyphenationAtTrackBottom and useWord2013TrackBottomHyphenation as hyphenation-keep setting "COLUMN", shifting last hyphenated lines of pages and columns, like MSO does. | |
tdf#132599 add "Hyphenation across" options Document new options of LO 24.8 to control hyphenation in last line of a column, page or spread. |
The code behind “Hyphenation across” has been generalized for all possible page changes, including linked frames, also columns in tables and linked frames.
The DOCX export broke the layout of the documents created in Writer, for example, resulted more pages in MS Word in the case of hyphenated paragraphs. This problem was fixed by adding the missing allowHyphenationAtTrackBottom DOCX compatibility setting.
The following composite screenshots show the DOCX export in MSO (red text), which was 3-page before the fix (top row). After the fix, the result is 2-page in MSO (bottom row), as in Writer (black text). Test document:
Hyphenation was lost, if it was enabled only in “Text body” instead of the default paragraph style. Now Writer exports hyphenation in this case, too, which is more common for documents created in Writer.
CSS 4 „always” was implemented as Hyphenate across → Last full line of paragraph. The hyphenated word of the last full line of the paragraph moves to the last line (if there is enough place for it). This results in longer last lines, and removed hyphenation in the bottom right-hand corner of the paragraph.
LEFT: missing recognition of hyphenate-keep-type="always" in the last paragraph. RIGHT: correct layout: hyphenated word of the last full paragraph was shifted to the last paragraph line.
The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page”, “Spread” and the newest “Last full line of paragraph” on Text Flow pane of the Paragraph settings. The test document and its screenshot show that the hyphenated line was shifted to page 2, according to the Hyphenation across → Page setting:
Now “Hyphenation across” works in tables, too, removing the widow setting dependency of the previous implementation:
LEFT: missing shifting between the split table cell, with hyphenation-keep-type="spread". RIGHT: correct layout.
Linked frames are like columns on the same pages, now with correct layout:
LEFT: bad shifting between the linked frames on a single page, with hyphenation-keep-type="page". RIGHT: correct layout.
With hyphenation-keep="spread", blank left pages weren’t handled correctly, also linked frames anchored only on different right pages.
LEFT: missing shifting between linked frames on right pages, with hyphenation-keep-type="spread". RIGHT: correct layout.
Spread is still recognized with linked frames on left and right pages:
Details of the core and DOCX filter developments, also extending LibreOffice help with the new paragraph settings:
Commit | Description |
tdf#132599 sw schema xmloff: add hyphenation-keep-type='always' Add new hyphenation option to limit hyphenation of the last full line of the hyphenated paragraph. Move also loext:hyphenation-keep-type to paragraph-properties, following the associated hyphenation-keep. Note: value "always" is defined by CSS 4 hyphenate-limit-last, see https://www.w3.org/TR/css-text-4/#hyphenate-line-limits. | |
tdf132599 sw: fix hyphenation-keep for linked frames, also for spreads Linked text frames are hyphenated as columns on the same page, i.e. do not shift the hyphenated line, if hyphenation-keep-type="page" or "spread". For "spread", check also that the hyphenated line is on the previous left page, because checking only right page wasn't enough for linked text frames and blank left pages. | |
tdf#132599 sw: fix hyphenation-keep for tables and no widow Now hyphenation-keep works without widow settings, too, e.g. in tables (where despite the existing widow settings, widow handling is always disabled). | |
tdf#132599 sw: fix test of "fix hyphenation-keep for tables and no widow" The problem was reported by Miklós Vajna. | |
tdf#132599 sw: fix unit tests for hyphenation-keep with frames Fix en_US language of the test documents to be consistent with the hyphenator condition in the related unit tests of commit d4304cd0a4fedd0117fea3625dff1fca2945a0e6 "tdf132599 sw: fix hyphenation-keep for linked frames, also for spreads". The problem was reported by René Engelhard. | |
tdf#160518 sw: fix DOCX import/export of hyphenation-keep – export hyphenation-page="page" setting of native ODF documents, if hyphenation is enabled in the default paragraph or in the text body style with this setting. It's lossless for hyphenation-keep-type="column", while the other values are converted to hyphenation-keep-type="column", which is the default layout of MSO 2013 and later. – fix LO roundtrip of DOCX documents which were created in MSO originally: while the roundtrip kept useWord2013TrackBottomHyphenation and allowHyphenationAtTrackBottom, the exported redundant suppressAutoHyphen = "false" settings of the paragraph resulted broken layout in Writer, because the repeated import overwrote every paragraphs with bad hyphenation setting (hyphenation-keep = "auto" instead of hyphenation-keep = "page"). – export also "Hyphenate CAPS" and "Hyphenation zone" settings, if hyphenation is enabled in text body style with these settings, and not in the default paragraph style. Setting hyphenation only in "Text Body" is more common in documents created in LibreOffice. | |
help: tdf#132599 add "Hyphenation across" -> Last full line of paragraph Document new option of LO 24.8 to control hyphenation in last full line of a paragraph. Fix also the changed IDs of the other "Hyphenation across" options. |
The patches contain several unit tests. The next manual tests list only the most important bug fixes:
1. Open tdf160518_auto_in_default_paragraph_style.fodt (attached to Bug 160518, as “2-page flat ODF” document). The document is 2 pages.
2. Save it in the format “Word 2010–365 Document (.docx)”.
3. Open the result in MS Word: the document is 2-page long, as in Writer. (The old export was 3-page long.)
1. Open tdf132599_always.fodt (attached to Bug 132599, as “flat ODF test document for "Last full line of paragraph"”). The last full line of the last paragraph is not hyphenated. The previously hyphenated word (“celestial”) is shifted to the last line. (This feature wasn’t supported before.)
2. Click on the paragraph settings of the last paragraph, and enable “Last full line of paragraph” in Text Flow → Hyphenate Across. The word “celestial” is hyphenated.
1. Open tdf132599_page_in_table.fodt (attached to Bug 132599 as “test document: In tables, do not hyphenate across spread (3-page document)”). The document is 3-page long. (This was 2-page long because of missing support of Hyphenation across in tables.)
2. Click on the paragraph settings of the paragraph, and enable “Spread” in Text Flow → Hyphenate Across. The document is 2-page long, because the hyphenated line “except that it has an at-” is allowed to be bottom of page 1, which is on the right in its spread.
1. Open tdf132599_frames_on_same_page_hyphenation.fodt (attached to Bug 132599 as “test document: Hyphenation across column in linked frames”). Bottom of the left frame is the hyphenated line “space, ex-”. (Previously this line was shifted to the next frame.)
2. Click on the paragraph settings of the paragraph, and disable “Column” in Text Flow → Hyphenate Across. The hyphenated line “space, ex-” is shifted to the next frame.
1. Open tdf132599_frames_on_right_pages_no_hyphenation.fodt (attached to Bug 132599 as “test document: In linked frames on right pages, do not hyphenate across spread”). The second frame on page 3 starts with the shifted hyphenation line “space, ex-”, according to the disabled Hyphenation across → Spread setting, because the first frame is on page 1 (a different spread). (This was broken before, because Writer didn’t check, that the page before the right page text content is a left page, i.e. on the same spread, or not).
2. Click on the paragraph settings of the paragraph, and enable “Spread” in Text Flow → Hyphenate Across. The hyphenated line “space, ex-” is shifted to the bottom of the first frame.
Hyphenated words got a new context menu item “No Break” to disable their hyphenation using the new “Exclude from hyphenation” character formatting. The context menu item remains available for the words with disabled hyphenation to enable their hyphenation again.
The other usability problem was the incomplete user interface of the new character formatting “Exclude from hyphenation”: it was not possible or very hard to notice the words which removed from hyphenation. Now these words got a light gray dotted underline, when Show Formatting Marks mode is enabled.
New “No Break” context menu of hyphenated words, and light gray dotted underline visualization of words with disabled hyphenation. (Note: no visualization for the previous workaround, the word with language setting “None” in the second paragraph.)
Added a new dispatcher call .uno:NoBreak for the context menus. The menu item “No Break” is visible only, if there is a hyphenated word or a word with No Break formatting under the text cursor (with or without selecting the word). The light gray text formatting is conditional, and not visible in the PDF export, and with disabled Show formatting marks.
Commit | Description |
tdf#161563 tdf#161565 sw: add No Break to word context menu & visualize Add No Break option to context menu of words hyphenated automatically, giving as easy access to fix paragraph layout, as context menu of misspelled words – like DTP software do. Also add this option to context menu of words with enabled "No Break" to disable it. To avoid unwanted paragraph layout during further text editing or formatting, visualize words excluded from hyphenation with a light gray dotted underline, when Formatting Marks is enabled. Follow-up to commit b5e275f47a54bd7fee39dad516a433fde5be872d "tdf#106733 sw: implement CharNoHyphenation" and commit 73bd04a71e741788a2f2f3b26cc46ddb6a361372 "tdf#106733 xmloff: keep fo:hyphenate in character formatting". | |
tdf#161563 sw: show "No Break" context menu only on a whole word It's possible to set CharNoHyphenation on shorter character sequences, than a word, but the result is not correct (use soft hyphens for alternative hyphenation within words), so limit "No Break" menu item only for selected words. (Not completely, because only Point() is checked for word boundary yet, not also Mark().) If no selection, cursor position must be within the hyphenated word (where "No Break" applied for the whole word automatically). This fixes also the assert in SwTextFrame::IsInHyphenatedWord(), when multiple nodes were selected. | |
tdf#161563 sw: fix invisible light gray underline for No Break Light gray underline visualization depended on IsShowHiddenChar() instead of the correct IsViewMetaChar() (Show Formatting Marks). |
1.Open the test documents tdf106733.fodt or tdf106733_LinuxLibertineDisplayG.fodt (attached to Bug 106733). The words with enabled “Exclude from hyphenation” got a light gray underline.
2.Click on the Show Formatting Mark (paragraph mark) icon to disable and enable the light gray underline.
3.Open the context menu of the hyphenated word in the first paragraph. Choose the first item “No Break” to disable its hyphenation. The word is not hyphenated any more and got a light gray underline.
4.Open the context menu of the word with the light gray underline, and choose No Break again. The word is hyphenated again, and no more light gray underline.
Value “Maximum consecutive hyphenated lines” wasn’t imported from DOCX files, and the associated ParaHyphenationMaxHyphens wasn’t exported to the OOXML document setting consecutiveHyphenLimit, losing layout interoperability.
Note: OpenDocument interoperability is possible here, false information on page 61 in Eckert et al.: Document Interoperability – Open Document Format and Office Open XML, Fraunhofer Verlag, 2009.
As a regression, smart justify, i.e. space shrinking resulted overshrank lines, i.e. lines with removed spaces and overlapping words when the line hyphenated only in the first call of SwTextGuess::Guess(). (First call calculates the available spaces, the second call makes the final line break.):
After the fix, skipping hyphenation completely:
The problem was solved with the reiinstantiation of the SwTextGuess object for the optional second call.
Note: also Caolán McNamara (Collabora Productivity) made a SwTextGuess fix related to a compiler-based code analysis, which was an alternative solution for the reported test document.
Default hyphenation zone is not zero in OOXML, but ¼ inch, according to the standard (see w:hyphenationZone, ECMA–376 – Offixe Open XML 1st Edition). Because DOCX export of Writer didn’t contain its default zero hyphenation zone, MSO imported the document with a non-zero hyphenation zone, potentially losing the text layout. For example, with normal 11 pt font size, hyphenations “al-legory” or “fi-nal” are disabled with ¼ inch hyphenation zone.
As a continuation of the implementation of the hyphenation zone in Bug 149421, the default hyphenation zone ¼ inch was added to the DOCX import. Also the zero hyphenation zone is always exported, i.e. in case of documents created in Writer, solving the text layout difference.
Note: it seems, MSO doesn’t follow its own standard, because it uses unknown default values in some languages, e.g. bigger as ¼ inch, depending on the language of the operating system. For example, the LibreOffice_tracked-changes_bug.docx of Bug 161628 got 425 twips (~0,75 cm) from Office 365 instead of the standardized 360 twips (¼ inch) on a Hungarian operating system. Microsoft “Open specification” mentions the difference, but without the details: https://learn.microsoft.com/en-us/openspecs/office_standards/ms-oe376/660d0f16-dffb-48ea-a25d-7210fb2f2a7a.
Test document hyphenation_zone.docx (Bug 149421) shows different line break (közvet=lenül/közvetle=nül), while the hyphenated word “közvetlenül” has the same possible hyphenation points in Writer and MSO (köz=vet=le=nül):
Composite image: red – MSO, black – Writer
Disabling hyphenation zone (in MSO, setting 0.01 cm) didn’t modify the hyphenation, so the difference is not related to the hyphenation zone. There is no justification and smart justify here (the test document is created in MSO 2010, so it doesn’t use smart justify for justified lines). Choosing sub hyphenation seems to be a bug in MSO, and it needs more investigation.
Commit | Description |
tdf#161643 sw DOCX import/export of maximum consecutive hyphenated lines Fix line break interoperability by importing w:consecutiveHyphenLimit to ParaHyphenationMaxHyphens, and exporting ParaHyphenationMacHyphens to w:consecutiveHyphenLimit in OOXML import/export filters. | |
tdf#160170 sw: fix overshrunk justified lines at hyphenation Smart justify uses 2 SwTextGuess::Guess() calls to break a line, but using the same SwTextGuess object resulted overshrunk lines, if the first call resulted hyphenation, because of the bad state of the object for the second call. If we need a second call, now instantiate a new object for it. Regression from commit 36bfc86e27fa03ee16f87819549ab126c5a68cac "tdf#119908 tdf#158776 sw smart justify: shrink only spaces". Note: the reported test document was already fixed by commit f050103c3324d878b310f37429ea3580a8230905 "stale hyphenation data after skipping blanks". | |
tdf#160170 sw: test for fix overshrank lines with hyphenation Follow-up to commit ca540209a8c20a2734f180d4706d5153bdf64523 "tdf#160170 sw: fix overshrunk justified lines at hyphenation". | |
tdf#161628 DOCX import: set default hyphenation zone (1/4 inch) Default value of hyphenationZone is 360 twips (0.25"). Apply this value, if hyphenationZone is not defined, according to the OOXML standard. Follow-up to commit 5a079652c1b1f968a851f47995b0a65b84d2d192 "tdf#149421 DOCX: import/export hyphenation zone". | |
tdf#161628 sw DOCX: export zero hyphenation zone, if it's not defined To keep the layout of the document, export zero hyphenation zone instead of nothing, otherwise it would be 360 twips after importing the document with the default hyphenation zone. |
1.Open 2007351228.docx (test document of Bug 76163). Check its paragraph setting Maximum consecutive hyphenated lines on Text Flow page in Format → Paragraph → Paragraph… dialog window. Value of the setting is 1 (not zero).
2.Save the document in a different place in DOCX format, and reload it. The value is still 1.
1.Open the test document tdf160170.fodt (Bug 160170), and check the first line: it contains spaces.
Create a new document with hyphenation, and with zero hyphenation zone (default in Writer):
1.Put the cursor in a paragraph in Default Paragraph Style.
2.Choose Format→Paragraph… Text Flow, and enable Hyphenation. The default hyphenation zone is zero.
3.Export the document in DOCX 2010–365 format.
4.Reload it, and check hyphenation zone: it is still zero.
5.Load the exported document in MSO: search hyphenation settings in the search bar, and check hyphenation zone: it is zero (it was 360, 425 etc. before fixing the export).
1.Open the test document LibreOffice_tracked-changes_bug.docx (Bug 161628), which does’t contain w:hyphenationZone definition.
2.Check the value of the hyphenation zone in Format→Paragraph… Text Flow: it’s ¼ inch (0,63 cm), not zero, like in MSO
This project was funded through the NGI0 Entrust Fund, a fund established by NLnet Foundation with financial support from the European Commission's Next Generation Internet programme. More information: https://nlnet.nl/project/LO-Typography/