Thursday, March 20, 2014

Manipulating PDF Form Fields with Python

Hello world! I figured that I would start my blog off with a topic near and dear to my heart. I'm talking, of course, about adding/changing/clearing form fields in PDFs! The process, like quite a few things in Python, is surprisingly painless. The steps are as follows:

1. Install the PDF Toolkit here. This is a fantastic command line tool for manipulating PDF files in quite a few different ways.

2. Install the fdfgen library here. Just clone it and copy the fdfgen folder into your Python site-packages folder. The library is also on Pypi, however it is not as up-to-date, and will not work with the script that I use below.

3. Technically, you now have all of the necessary tools. However, I would also suggest using this script. The two functions in it take care of quite a bit of behind-the-scenes work involving the tools in steps 1 and 2. Feel free to put the functions wherever makes the most sense in the context of your project. The remaining steps assume that you are making use of this script, which I will refer to using the namespace funcs.

4. Call fields = funcs.get_fields(pdf_path). This reads the pdf at the path you provide, and initializes the variable fields as a dictionary of field names from the pdf that map to their current values. Note that the only way to actually see the location of form field names on a pdf, to the best of my knowledge, is with Adobe Acrobat.

5. Modify the values in fields to whatever you wish them to be in the output pdf.

6. Call funcs.write_pdf(original, fields, output), where original is the path of your original pdf and output is the path of the pdf that you would like to create. You may also set the flatten keyword argument to True if you wish to grey out the fields in the output pdf.

7. Done! You now have a filled-out pdf at the location that you specified in step 6.

Here is the script that I created in the process of writing this post.



Note: For those of you who don't wish to use the script in step 3, the nitty-gritty process involves using pdftk's dump_data_fields command to get the field names/values. You then parse that output and create/modify a dictionary or a list of lists (containing two elements each) corresponding to these fields and pass it to fdfgen.forge_fdf(). This function will return a string that you then need to write into an .fdf file. Then use pdftk's fill_form command to merge the old pdf with your fdf file and create a new pdf. But seriously, just use the script.