View Single Post
Old 08-11-2020, 05:08 AM   #1
robert.swita
Member
robert.swita began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Aug 2020
Device: none
[Editor Plugin] Reformat plugin

This plugin simplifies, corrects and reformats books in EPUB format. It gets rid of the nested <span> and <div> tags, removes dummy paragraph classes and many more. The desired effect is an ebook in unified format, described below. Default CSS classes are contained in a special file default.css (no references added to this file in a document, it serves as a guideline and source of class definitions). The goal of this Games of Books is to use only CSS classes contained in default.css without changing book's intended appearance.

Installation
Download the attached .zip file and install via Preferences->Advanced>Plugins->Load Plugin or as described in the Introduction to plugins thread. Plugin Reformat will be available in Calibre’s e-Book Editor in Plugins menu.

Main Features & EPUB ebook formatting guidelines
  1. An ebook should preferably consist only of :
    1. Cover (generated automatically by Calibre)
    2. Title, Author, Publisher (<h1-h4> tags in <intro> block)
    3. Dedication, Maps, Introduction (like in chapters)
    4. Chapters (<h1-h3> headers, <center>-<left>-<right> blocks, <diary>-<letter>-<stanza> for italicized blocks, <p> for paragraphs, <h4> for keywords)
    5. Glossary, Index (<dl>,<dt>,<dd> tags)
    6. Copyright note (<center> with <br> tags)
  2. Info about the Author, more books from this author, publisher details – should be outside of the book, preferable in a special ebook ‘About.epub’
  3. Ebook text will be divided into chapter files based on <section>’s tags or <h1-h2> headers (when there are no <sections> yet) or TOC or bolded style class of a paragraph. Remove <section> tag to merge, or add extra </section><section> tags to split files.
  4. Remove inline TOC chapter
  5. All <div> tags will be replaced by default with <center> tag (if it contains some text or image) or reduced (removed, but with keeping it’s children). Blocks should be precisely defined with <diary-letter-stanza> tags or/and with formatting <center-left-right> tags. Blocks always add extra top & bottom margins. The <br/> and <p> tag inside a <p> paragraph is not allowed, and will split the paragraph in two. Paragraph starting with lower letter is joined with previous paragraph.
  6. Attributes allowed in tags:
    1. (inline)<b,u,br,sub,sup,small,em,strong>: none
    2. <span>: any
    3. <i>: ‘lang’
    4. <a>: ‘id’, ‘href’
    5. <img>: ‘src’
    6. <td>,<th>: ‘rowspan’
    7. <ol>: ‘type’
    8. other: ‘class’
  7. Regex expression will try to correct format, remove page numbers and unnecessary white characters (spaces, tabs, line breaks).
  8. Common style classes are included in a special file ‘default.css’. You can use classes from this file, and plugin will insert automatically their definition into CSS file.
  9. TOC is generated automatically based on <h1-h3> headers, except for <intro> blocks.
  10. References in text can be implemented using anchors e.g.
    <a id=”ref0” href=”#ref1”>
    with list of reference notes at the end of the book in the form:
    <dt><a id=”ref1” href=”#ref0”>[1]</a></dt><dd>note</dd>

    Attributes href will be automatically corrected to include proper file reference. Broken anchor links will be reduced. Reference names can be changed using CSS property ‘content’ of the class ‘a’ selector.
  11. This plugin is meant to be called several times (after manual corrections), until the desired effect. Stylesheet is every time updated, removing unused styles and adding new classes from default.css file if needed. You can undo reformatting from Edit menu.

Example of using default CSS classes:

<section>
<intro>
<h2 class=”author”>author</h2>
<h1 class=”title”>title</h1>
</intro>
</section>

<section>
<h1>Part I<h5>PartTitle</h5></h1>
<center><img image.jpg></center>
</section>

<section>
<h2>Chapter 1<h5>ChapterTitle</h5></h2>
<p class=”left">paragraph</p>
<p>paragraph</p>
...
<center>
<stanza>
To be, or not to be,<br/>
That is a question
</stanza>
</center>
...
<letter>
<left>Dear readers</left>
<p> paragraph </p>
<right>author</right>
</letter>
...
</section>

...
<h4>THE END</h4>

Version History
Spoiler:

2.3.5
Arranging files into default folders

2.2.0
Few minor patches
Stable version

2.1.0
Initial splitting includes original splits

2.0.5
Attributes restrictions for tags
Leaves CSS font rules intact

2.0.0
Removes severed links (e.g. after inline TOC removal)
Deals with paragraphs inside paragraphs

1.9.0
Handling of the <br/> tag

1.8.0
Clears comments and namespaces
Minor corrections in anchors

1.7.0
Works with repeating IDs in different files
Better recognition of subchapters
More picky about tag attributes

1.6.0
Rewritten merging.
Major changes in anchor handling

1.0
Rewritten splitting
Attached Files
File Type: docx Formatting guidelines.docx (18.5 KB, 366 views)
File Type: zip Reformat.zip (11.3 KB, 45614 views)

Last edited by robert.swita; 11-06-2021 at 10:29 AM. Reason: update to v2.4.0
robert.swita is offline   Reply With Quote