The problem: you need to decode a JSON string, but at some point in the process you don’t control, unescaped double quotes are inserted into your string values. How might you sanitize the string to get a valid JSON object for decoding?
⚠️ The first answer I saw on Stack Overflow was “Go make the upstream service give you valid JSON, [duh you stupid idiot]”. Helpful! Sometimes you’re stuck in crappy situations and need a way forward, even if it’s not ideal. Hopefully this article acts as the answer I wish I had found.
Why am I bothering to do this? JSON with unescaped double quotes in it isn’t valid JSON, so shouldn’t it be getting sanitized before reaching me? Unfortunately, no. As a part of building Workflow Buddy, an open-source tool to extend Slack’s no-code workspace automation, I need to handle untrusted variables being inserted into plain-text JSON strings before I have a chance to decode them.
An actual example of this in action looks like:
then becomes
This makes the standard JSON parser go 💥, because how is it supposed to know you didn’t want key: "value with " and just forgot to add another key after?
It provides the most flexibility for end users. I don’t want my arbitrary decisions to block them from the critical last-mile of what they’re building. Whether you think that’s a stupid reason or not, this article is moving on without you.
As the intrepid developer I am, I checked Stack Overflow first and came up with squat that felt useful. As all heroes do, after crying in the bathroom, I closed all my SO tabs and started frantically coming up with ideas. Gotta dig through some garbage to find the gold, right?
This Python version follows the principle of “Ask forgiveness, not permission”. When we get a decoding failure, we make an educated guess about which quote we can tweak (e.g. convert to escaped, \"), then attempt to decode again. If the decoder made deeper progress into the string characters, it was a positive change. If not, then we really mucked things up and should bubble the error up.
Workflow Buddy source file: utils.py
Is this a perfect solution? Nope, you’ll have to consider if the limitations screw up your use case. The most important I’ve run into is if you provide valid JSON inside of your string, you’ll end up with different top level keys in your object than you expect. For example:
What you’ll get is:
If you are curious about digging deeper, feel free to check out the test cases for the parser in theWorkflow Buddy repo.. Function issut.sanitize_unescaped_quotes_and_load_json_str(test_str).
⚠️Not completed yet, but dropping my thoughts from initial exploration here, since it seemed possible to get very similar functionality to the Python sanitizer.
When a Decode error occurs, you get access toe.message, example:
and and could then parse it for theline 1 col 39bit. Unfortunately, but that’s not the exact same asposin the Python example.
You should be able to get creative and “hop” lines by looking for next\n, then using the column info.
Ideally we would find a way to do it without needing any information from the decoder other thansuccess/failure- then any language would be able to use it.
As this is the internet, someone reading this will have an opinion about my approach. Let me know if you have suggestions or alternative approaches through Twitter DMs or my email proxy .
Tired of hand calculating the total cost for your graduated pricing tier in Stripe? Me, too!
Lessons learned from a couple days spent debugging everything BUT the problem.
Most devs don't need a complicated setup for Python, they just need to get running. Leverage a single Docker command to run any version in isolation.
Do you want to be more than a code monkey? Learn to avoid blunders and become a superhero to your team.
It's so easy as a developer to dismiss the idea of learning no-code tools. After all, you know how to code, why spend time learning tools designed for those who can't?