PDF to HTML
Now iam working on a new project of Michel Tu.
You can get the source code from https://github.com/neumino/PDF-to-unusual-HTML/commit/6c28fd52962e68b17a5142db5bc5a7dc4b00cdc2.
I cloned the project to my local system. (You can use either mercurial or git for cloning. You can also download zip file of this project from the above link.) After that I created a java package and try to run it by the command java -jar. But it always ended up with an error. So I decided to to go for a better option of using NetBeans IDE instead of fixing the package building issue. I tried to run the source code of the project using NetBeans. (But before trying to run the project, please make sure that you have Imagemagick installed on your system.) But this time also, i got an error.
So I checked the code, and found that the system call,
command = pathToImagemagick+” -density ” +density+” “+pathToPdf+” “+pathToDirectory+imageName;
in the file Pdf2Json is not working properly. To make sure that, I run the command in my terminal as
convert -density 108 /home/neeraj/NetBeansProjects/icresume.pdf /home/neeraj/NetBeansProjects/icresume-0.png .
I found that it is not working, and got error messages.
Then I run the same command with a little change
/usr/bin/convert -density 108 /home/neeraj/NetBeansProjects/icresume.pdf /home/neeraj/NetBeansProjects/icresume-0.png .
Then it worked properly. So I changed the pathToImagemagick variable in the class ConvertPdf as follows
static String pathToImagemagick = “/usr/bin/convert”;
After doing this, i can successfully run the project. After running, i got the following files.
pdffile-x.png image files of each page of the pdf file
and a pdfile_words.txt which contains the all words of pdf file with their position in JSON format.