Neeraj's Blog

There is always an open source solution..

PDF to HTML

Now iam working on a new project of Michel Tu.

You can get the source code from  https://github.com/neumino/PDF-to-unusual-HTML/commit/6c28fd52962e68b17a5142db5bc5a7dc4b00cdc2.

I cloned the project to my local system. (You can use either mercurial or git for cloning. You can also download zip file of this project from the above link.) After that I created a java package and try to run it by the command java -jar. But it always ended up with an error. So I decided to to go for a better option of using NetBeans IDE instead of fixing the package building issue. I tried to run the source code of the project using NetBeans. (But before trying to run the project, please make sure that you have Imagemagick installed on your system.) But this time also, i got an error.

So I checked the code, and found that the system call,

command = pathToImagemagick+” -density ” +density+” “+pathToPdf+” “+pathToDirectory+imageName; 

in the file Pdf2Json is not working properly. To make sure that, I run the command in my terminal as

convert -density 108 /home/neeraj/NetBeansProjects/icresume.pdf /home/neeraj/NetBeansProjects/icresume-0.png 

I found that it is not working, and got error messages.

Then I run the same command with a little change

/usr/bin/convert -density 108 /home/neeraj/NetBeansProjects/icresume.pdf /home/neeraj/NetBeansProjects/icresume-0.png 

Then it worked properly. So  I changed the pathToImagemagick variable in the class ConvertPdf as follows

static String pathToImagemagick = “/usr/bin/convert”;

After doing this, i can successfully run the project. After running, i got the following files.

pdffile-x.png image files of each page of the pdf file

and a pdfile_words.txt which contains the all words of pdf file with their position in JSON format.

Single Post Navigation

3 thoughts on “PDF to HTML

  1. preSsist@678

  2. Great Going!!!

  3. Somebody necessarily lend a hand to make seriously posts I’d state. That is the first time I frequented your website page and up to now? I amazed with the research you made to create this actual publish incredible. Fantastic process!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: