A Real-World Solution to Escape Embedded Double Quotes in JSON

The problem: you need to decode a JSON string, but at some point in the process you don’t control, unescaped double quotes are inserted into your string values. How might you sanitize the string to get a valid JSON object for decoding?

⚠️ The first answer I saw on Stack Overflow was “Go make the upstream service give you valid JSON, [duh you stupid idiot]”. Helpful! Sometimes you’re stuck in crappy situations and need a way forward, even if it’s not ideal. Hopefully this article acts as the answer I wish I had found.

Why 🔗︎

Why am I bothering to do this? JSON with unescaped double quotes in it isn’t valid JSON, so shouldn’t it be getting sanitized before reaching me? Unfortunately, no. As a part of building Workflow Buddy, an open-source tool to extend Slack’s no-code workspace automation, I need to handle untrusted variables being inserted into plain-text JSON strings before I have a chance to decode them.

An actual example of this in action looks like:

then becomes

This makes the standard JSON parser go 💥, because how is it supposed to know you didn’t want key: "value with " and just forgot to add another key after?

Why do I let people build their own JSON strings in plain-text input blocks 🔗︎

It provides the most flexibility for end users. I don’t want my arbitrary decisions to block them from the critical last-mile of what they’re building. Whether you think that’s a stupid reason or not, this article is moving on without you.

Potential approaches 🔗︎

As the intrepid developer I am, I checked Stack Overflow first and came up with squat that felt useful. As all heroes do, after crying in the bathroom, I closed all my SO tabs and started frantically coming up with ideas. Gotta dig through some garbage to find the gold, right?

🤮 Randomly change a quote to escaped, then see if decode works (slow and inefficient).
😖 Walk through string chars, keep a STATE to check if we are in a key or a value - (how to actually tell if we are in a key/value vs just finding JSON special characters, e.g. {}",, in a string?).
🤔🥇 Make an educated guess for which quote we could escape, then try to decode again and see what happens (Not the worst idea to explore….).

Escape double quotes in JSON with Python 🔗︎

This Python version follows the principle of “Ask forgiveness, not permission”. When we get a decoding failure, we make an educated guess about which quote we can tweak (e.g. convert to escaped, \"), then attempt to decode again. If the decoder made deeper progress into the string characters, it was a positive change. If not, then we really mucked things up and should bubble the error up.

Workflow Buddy source file: utils.py

Limitations🔗︎

Is this a perfect solution? Nope, you’ll have to consider if the limitations screw up your use case. The most important I’ve run into is if you provide valid JSON inside of your string, you’ll end up with different top level keys in your object than you expect. For example:

What you’ll get is:

If you are curious about digging deeper, feel free to check out the test cases for the parser in theWorkflow Buddy repo.. Function issut.sanitize_unescaped_quotes_and_load_json_str(test_str).

Escape double quotes in JSON with Javascript (WIP)🔗︎

⚠️Not completed yet, but dropping my thoughts from initial exploration here, since it seemed possible to get very similar functionality to the Python sanitizer.

When a Decode error occurs, you get access toe.message, example:

and and could then parse it for theline 1 col 39bit. Unfortunately, but that’s not the exact same asposin the Python example. You should be able to get creative and “hop” lines by looking for next\n, then using the column info. Ideally we would find a way to do it without needing any information from the decoder other thansuccess/failure- then any language would be able to use it.

WIP JSON testing🔗︎

Tell me why I’m wrong 🔗︎

As this is the internet, someone reading this will have an opinion about my approach. Let me know if you have suggestions or alternative approaches through Twitter DMs or my email proxy .

A Real-World Solution to Escape Embedded Double Quotes in JSON

UPDATED: JAN 30, 2024 | PUBLISHED: DEC 9, 2022 | 2.15k words, 11 minute read — SOFTWARE DEVELOPMENT

#DEVELOPERS, #JSON