When it comes down to working with text documents collaboratively, I’m a happy user of the track changes and comment functionalities because they allow people working together on a document to easily communicate about the text and see each other’s changes.
Over the last few days, a colleague and I have been working on a long document in LibreOffice. The document contains a number of tables and we’ve been making extensive use of comments as well as tracking our changes in this 40+ pages document. Last night, I was working on the document and accepted changes to a table as well as deleted some comments while the changes were still being recorded. I worked away at the document saving frequently. I left the document open over night since I wanted to work on it some more this morning. All went well until I closed the document.
Before sharing it with my colleagues, I wanted to double-check something in the document, but could not open it. Instead, I was presented with the following error message: Read-Error. Format error discovered in the file in sub-document content.xml at 2,4879(row,col).
Before going into panic mode, I tried the obvious: Closing the document and trying to open it again: No luck. I then sent it to a colleague to check if he could open it: Still no luck. Since I don’t know XML well, I went in search of someone who might be able to provide help.
That’s when my luck turned around since the first person I asked for help (actually rather if he knew who might have some more knowledge) was Grant McLean. He offered to take a look and if he could find the cause quickly, fix it for me. We conducted a quick search that revealed that it might be a problem due to an attribute being used twice in the style definition. The instructions didn’t prove too helpful, but Grant whipped up something very quickly which I’m going to share here for others to use if they are in need and for me to refer back to if I ever come across the problem again.
The instructions are for a Linux-based environment, in my case Ubuntu 14.04.
- Open the LibreOffice ODT file in the Archive Manager.
- Drag the content.xml file out onto the desktop to extract it.
At this stage, the xml file displays everything in one long line which is not really helpful for debugging. So Grant used a little Perl script to make the file more readable. Since we were only interested in parts of the file, this script was sufficient and put in some line breaks. - The script finds every instance of “/>” in the document and replaces it with a line break before “/>”. That preserves the syntax of the document itself, but allows us to get at least some line breaks in there making the document more readable. In the terminal at the prompt, type the following:
perl -i -p -e ‘s{/>}{\n/>}g’ content.xml - If you don’t have XMLLint installed, you can get it via:
sudo apt-get install libxml2-utils - Run the XML file through XMLLint to check where the error is. Note down the line number:
xmllint –noout content.xml - Open the file in your favorite text editor where line numbering is turned on:
vim content.xml - Navigate to the line with the problem. In my case, we saw the “office:name” attribute twice for one content item. Once it was removed, I could open the document again and continue editing.
But that’s not where the story ends. That would have been too easy. 😉 Since we saw that the problem was close by to / in a table, we assumed that the problem might lie there. So I located the table and removed all formatting, saved the file and could not open it again. Next I deleted the table from the file that Grant had sent me, re-created it manually, but still no luck. After some more tries with Chris Cormack who could open the file and edit and save it, we updated my LibreOffice to the latest version hoping that the bug might have been fixed. No such luck. So I went back to Grant to get the above instructions from him on how to fix this issue as it seemed that I would not be able to erase the problem quickly, but would need work on the document and after each closing run through the procedure to open it again. For the sake of being able to continue working, I was willing to use that workaround and leave the document open as much as possible.
Grant explained the steps and we jotted them down. While we looked at the document, he also showed me a neat trick:
If you open the document in vim, navigate to an attribute and press Shift+*, you are taken to the next place in the document where the same attribute is used. That’s how we found that the issue was actually with the annotations / comments and not the tables. Somehow working with annotations (possibly in a table) and deleting them while the track changes functionality was turned on, caused a problem.
Now I could get rid of the issue entirely: I deleted all comments in the document, and I could open it correctly again without dabbling in XML.
We assume that this is a LibreOffice bug, and I am happy to report it. Unfortunately, I have not yet been able to reproduce it in a small test document. It works fine there. I’ll probably need to create a longer document similar to the one that I’m working on and see if I can re-create the problem.
I’m thankful to Grant and Chris for their help, and also to LibreOffice whose files are actually archives and include an XML file that you can open in a simple editor and manipulate when you can’t open it in LibreOffice itself.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.