PDF is where OER’s go to die

I really dislike PDF. No, really. Dislike is too measured a term. I hate it. I want to declare war on the tyranny of PDF for content that has been licensed for modification and remixing.

We WANT to reuse!

As part of the BC Open Textbook project, we want to start from the point of building on what others have done before; to realize the promise and potential of OER reuse. It only makes sense that we try to reuse what already exists in the commons and avoid recreating the wheel?  Why create a Calculus textbook when a half dozen already exist that can be modified?

There is no shortage of OER material

It doesn’t take a deep immersion into the world of OER to see that there is no lack of OER material. The OER movement has been around for over a decade and in that time, vast repositories of openly licensed content have been created and collected and sit in repositories (including BC’s own SOLR where 10 years worth of Online Program Development Fund projects are stored). Developing resources from scratch isn’t an issue. But reusing those objects or improving those resources? Um, well….

Okay, we have pretty well solved the legal issues around modification

In the early days of the web,there were very few mechanisms that would allow people to legally copy, reuse and modify material found on the web. Copyright was rigid and worked against the principle of reuse. Well, that environment has changed immensely in the past decade with the rise of licenses like Creative Commons, which allows content authors to specify up front how their content can be used. We now have a large body of work licensed in a way that allows for reuse. That legal impediment to reuse has been dealt with, and we have content that authors have legally given others the right to remix and modify. We have crossed that hurdle.

Now it is a technical hurdle

And this is a massive problem (at least for me right now), as anyone who has spent any amount of time trying to convert documents from one format to another knows. I am finding some great CC-BY licensed resources locked away in technical formats that, for all practical purposes, makes reuse near impossible (yes, I am looking squarely at you, PDF. Flash, you are not far behind).

In other words, I can legally modify and reuse this material because the license says I can, but in practical real terms, I cannot because the content is locked in an unmixable format.

Content that is made available in a legal format for modifying, but is not made technically available for modifying seems so self-defeating. Sure, go ahead and use my content, you have my permission, here is my PDF file <insert pin into balloon>.

It feels like we are crossing one cultural hurdle around reuse (ownership, licensing, etc) only to be faced with another in that we cannot technically modify what we have been given the legal right to.

It can be done, but…

Content can be liberated from PDF documents, but it is a difficult, expensive, nit-picky process that requires a lot of manual work by people with some tech chops. To expect an average user to be able to liberate content in a PDF and make it into a reusable format that can then be output to a number of different formats cleanly is just not going to happen.

It doesn’t have to be this way

So, I make a bit of a plea. If you are creating content and have made the decision to license with a CC license that allows others to modify (and good on ya!), please consider making that content available in technical formats that can be remixed and modified. Even Word documents are preferable to PDF. Make the source files available.

A deliverable for the open textbook project I hope to achieve

This is my own personal goal for our project. To make available open textbooks in as many remixable formats as we can so that what we create can not only be legally modified, but technically. I want to make our source files available so that other can use what we have done.

And I want to take that a step further. If we convert a locked PDF document that is released under a CC-BY license into another format for reuse, then I want to make those source files available. It seems like we should be able to do that as it will be part of our normal workflow anyway. So, hopefully, in addition to creating new content as part of this project, we will be able to make available other existing open textbook materials that are locked away in PDF documents available for others to reuse. If we could do that as a natural byproduct of our work, that would be a victory in the ongoing battle to end the tyranny of PDF.

PDF is where OER’s go to die by Clint Lalonde, unless otherwise expressly stated, is licensed under a Creative Commons Attribution 4.0 International License.

29 thoughts on “PDF is where OER’s go to die

  1. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  2. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  3. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  4. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  5. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  6. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  7. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  8. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  9. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  10. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  11. PDF does have it's place if you are looking to create a document strictly for distribution (as your PDF book is). In that case, PDF is perfectly fine. But as a format that can be easily remixed, once you get past the fairly simple text based layout and into more complicated bits, then things get a bit more tricky.I think what you have done with your site is a great example of the point I am trying to make as you have your content available in numerous formats – you have it in HTML, in txt files, in PDF and in a print-on-demand format. If I was looking to modify and reuse your content, I would have some choices in what format I wanted to use. In fact, I think you could – if you were interested in having people reuse your content – go one step further and also make the OpenOffice document available, but I won't push my luck :). It's really fantastic that you have made your book available for free. It looks like a great resource. And, I don't mean to look a gift horse in the mouth as ANY sharing is better than NO sharing, and your resource is clearly valuable to your network, as the comments on your blog suggests.

  12. I expect you've looked at these already, but there are libraries like http://pdftohtml.sourceforge.net/ and https://github.com/CrossRef/pdfextract that can automate a great deal of extraction/conversion from PDF to a more open format like text or html, as well as other tools like http://poppler.freedesktop.org/. It might be an idea to offer an instance of one of these as an ancillary tool for people to extract content locked away in PDFs, or even to script links in certain systems (like SOL*R) so that there's an option for any link with a .pdf extension to also get passed through one of these convertors on the fly and be offered as text.

    You are of course correct that simply avoiding technical formats that inhibit reuse is the better practice, but barring that, there may be some approaches like these that help people more easily reuse PDF-based content.

  13. Clint, while I don't do my content as CC-BY (long story having to do with my employment situation), I do make free PDFs available, and most of the content in those PDFs is public domain anyway (I'm an anthologizer really), so no licensing issues arise, and it's easy for teachers to copy-and-paste the content from my PDFs. They get a few extra carriage returns, but that's not a big deal one way or the other. So while I agree that PDFs are not ideal, they are not (always) as bad as you say here. Admittedly, PDFs that are just page scans can be frustrating, but at least for the very simple, text-driven PDFs that I create, it's a decent option, esp. for someone like me who has about zero time to spend on the tech side, since I barely have enough time to write the content as it is. Here's an example – Mille Fabulae et Una, 1001 Aesop's Fables in Latin: http://millefabulae.blogspot.com – I use the blog to accumulate content over time, and then write the book with OpenOffice which exports a PDF. Fast and easy, which has an appeal to someone like me, and also to the teachers I work with who feel confident in working with PDF documents.

  14. PDF does have it's place if you are looking to create a document strictly for distribution (as your PDF book is). In that case, PDF is perfectly fine. But as a format that can be easily remixed, once you get past the fairly simple text based layout and into more complicated bits, then things get a bit more tricky.I think what you have done with your site is a great example of the point I am trying to make as you have your content available in numerous formats – you have it in HTML, in txt files, in PDF and in a print-on-demand format. If I was looking to modify and reuse your content, I would have some choices in what format I wanted to use. In fact, I think you could – if you were interested in having people reuse your content – go one step further and also make the OpenOffice document available, but I won't push my luck :). It's really fantastic that you have made your book available for free. It looks like a great resource. And, I don't mean to look a gift horse in the mouth as ANY sharing is better than NO sharing, and your resource is clearly valuable to your network, as the comments on your blog suggests.

  15. PDF does have it's place if you are looking to create a document strictly for distribution (as your PDF book is). In that case, PDF is perfectly fine. But as a format that can be easily remixed, once you get past the fairly simple text based layout and into more complicated bits, then things get a bit more tricky.I think what you have done with your site is a great example of the point I am trying to make as you have your content available in numerous formats – you have it in HTML, in txt files, in PDF and in a print-on-demand format. If I was looking to modify and reuse your content, I would have some choices in what format I wanted to use. In fact, I think you could – if you were interested in having people reuse your content – go one step further and also make the OpenOffice document available, but I won't push my luck :). It's really fantastic that you have made your book available for free. It looks like a great resource. And, I don't mean to look a gift horse in the mouth as ANY sharing is better than NO sharing, and your resource is clearly valuable to your network, as the comments on your blog suggests.

  16. PDF does have it's place if you are looking to create a document strictly for distribution (as your PDF book is). In that case, PDF is perfectly fine. But as a format that can be easily remixed, once you get past the fairly simple text based layout and into more complicated bits, then things get a bit more tricky.I think what you have done with your site is a great example of the point I am trying to make as you have your content available in numerous formats – you have it in HTML, in txt files, in PDF and in a print-on-demand format. If I was looking to modify and reuse your content, I would have some choices in what format I wanted to use. In fact, I think you could – if you were interested in having people reuse your content – go one step further and also make the OpenOffice document available, but I won't push my luck :). It's really fantastic that you have made your book available for free. It looks like a great resource. And, I don't mean to look a gift horse in the mouth as ANY sharing is better than NO sharing, and your resource is clearly valuable to your network, as the comments on your blog suggests.

  17. PDF does have it's place if you are looking to create a document strictly for distribution (as your PDF book is). In that case, PDF is perfectly fine. But as a format that can be easily remixed, once you get past the fairly simple text based layout and into more complicated bits, then things get a bit more tricky.I think what you have done with your site is a great example of the point I am trying to make as you have your content available in numerous formats – you have it in HTML, in txt files, in PDF and in a print-on-demand format. If I was looking to modify and reuse your content, I would have some choices in what format I wanted to use. In fact, I think you could – if you were interested in having people reuse your content – go one step further and also make the OpenOffice document available, but I won't push my luck :). It's really fantastic that you have made your book available for free. It looks like a great resource. And, I don't mean to look a gift horse in the mouth as ANY sharing is better than NO sharing, and your resource is clearly valuable to your network, as the comments on your blog suggests.

  18. And listen, if you need a guinea pig for testing out your product, put me on your mailing list. I'm a prolific content producer but also sooooooooo bad about technology innovation because I have so little time. I'm working on a project that is totally separate from anything having to do with my school, something that I really do want to make available more as a CC-BY / remix thing (they are simple folktales retold – http://mythfolklore.blogspot.com/ ) – as usual, I am accumulating it as a blog but the potential audience here is not just my regular blog readers. If you can help me figure out better ways to share this content when I finish with the project in a year or so, man, I would be so grateful!
    For me the bugbear of multiple formats is having to update in more than one place. It sounds like your software would minimize that problem, which would be a godsend. Just having a PDF and a blog to update is enough to drive me bonkers. :-)

  19. PDFs are better than nothing, but not by much.

    They can by pried open in many ways, but it's not alway pretty or useful.

    One way to think about it is that, in terms if information/content, a PDF ends up a monolithic whole whose sum parts are not readily accessible. Even copy paste turns into a carriage return nightmare, not to mention squirrelly punctuation.

    It buts to question the reusable part, making the reuse limited to print, reading, and linking.

    Oh maybe OERs are not needing to be reusable

    • Depends on what the content is, but releasing the content in an editable format is a good goal. Or, if you have the time and ability, making multiple formats available so so people can pick and choose what works for them. Whatever you make the final product in, make all the pieces available.

Comments are closed.