I quite like JSON Patch but I've always felt that it's so convoluted only because of its goal of being able to modify every possible JSON document under the sun. If you allow yourself to restrict your data set slightly, you can patch documents much simpler.
For example, Firebase doesn't let you store null values. Instead, for Firebase, setting something to null means the same as deleting it. With a single simple restriction like that, you can implement PATCH simply by accepting a (recursive) partial object of whatever that endpoint. Eg if /books/1 has
{ title: "Dune", score: 9 }
you can add a PATCH /books/1 that takes eg
{ score: null, author: "Frank Herbert" }
and the result will be
{ title: "Dune", author: "Frank Herbert" }
This is way simpler than JSON Patch - there's nothing new to learn, except "null means delete". IMO "nothing new to learn" is a fantastic feature for an API to have.
Of course, if you can't reserve a magic value to mean "delete" then you can't do this. Also, appending things to arrays etc can't be done elegantly (but partially mutating arrays in PATCH is, I'd wager, often bad API design anyway). But it solves a very large % of the use cases JSON Patch is designed for in a, in my humble opinion, much more elegant way.
Also the first thing I was thinking. The only reason I can see for using JSON Patch is for updating huge arrays. But I never really had such big arrays that I felt the necessity for something like this.
It's kind of nice to retain the terse and intuitive format while also gaining features like "test" and explicit nulls. It's of course not spec compliant anymore but for standard JSON Patch APIs the client could implement a simple Merge Patch->Patch compiler.
I've only use JSON Patch once as a quick hack to fix a problem I never thought I would encounter.
I had built a quick and dirty web interface so that a handful of people we contracted overseas can annotate some text data at the word level.
Originally, the plan was that the data was being annotated in small chunks (a sentence or two of text) but apparently the person managing the annotation team started assigning whole documents and we got a complaint that suddenly the annotations weren't being saved.
It turned out that the annotators had been using a dial up connection the entire time (in 2018!) and so the upload was timing out for them.
We panicked a bit until I discovered JSON Patch and I rewrote the upload code to only use the patch.
I'm working on a project using CRDTs (Yjs) to generate efficient diffs for documents. I could probably use JSON Patch, but I worry about relying on something like fast-json-patch to automatically generate patches by diffing JSON documents.
However, the patch would potentially look like I changed the name of "Sally" to "Jenny", when in reality I deleted and added a new element. If I'm submitting this to a server and need to reconcile it with a database, the server cares about the specifics of how the JSON was manipulated.
I'd end up needing some sort of container object (like Yjs provides) that can track the individual changes and generate a correct patch.
Since JSON is a subset of JS, I would have expected `.` to be the delimiter. That jives with how people think of JSON structures in code. (Python does require bracket syntax for traversing JSON, but even pandas uses dots when you generate a dataframe from JSON.)
When I see `/`, I think:
- "This spec must have been written by backend people," and
- "I wonder if there's some relative/absolute path ambiguity they're trying to solve by making all the paths URLs."
Yeah, wrote my own XPath-like extension methods to manipulate JSON just like that. Felt very natural and makes it quite easy to generate and process JSON for the cases serialization/deserialization isn't the best option.
Yeah, but that's makes.toyota.models, not /makes/toyota/models.
The point is that this is a data structure and not a web server. It's using a convention from one domain in a different one. Relatively minor in the scope of possible quibbles, but it's just one more thing to remember/one more trivial mistake to make. I'm sure much collective time will be lost to many people using dots where the spec says to use slashes, and not realizing it because it looks right in every other context with dots. Dots also makes copy/pasting easier, because you wouldn't have to migrate the format after pasting.
But it's different than object notation in JS, and considering JSON stands for JavaScript Object Notation, I think dot notation would have been more appropriate for JSON Pointer (and by extension JSON Path). As a bit of a rebel myself, I use dot notation when describing a location in a JSON document, unless I'm forced to use the slash for something like JSON Pointer.
Extending the 80/20 analogy, how much additional efforts does the last 20% take here? The format seems efficient enough, but I'm wondering about the complexity trade-offs one can expect.
JSON is derived from JavaScript, it is not a strict subset.
The most glaring issue is JSON number type versus JavaScript float. This causes issues in both directions whereby people consistently believe JSON can't represent numbers outside the float range and in addition JSON has no way to represent NaN.
It is a subset as of JavaScript edition ES2019, when JavaScript strings are now allowed to contain U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR. Prior to ES2019, that was the only known example of legal JSON that was not legal JavaScript
JavaScript used to forbid U+2028 line separator and U+2029 paragraph separator in string literals, but JavaScript now allows them for compatibility with JSON.
The remaining wrinkle is different handling of "__proto__"
Why is the path a string and not an array? That means you have to have some way to escape / in keys, and also you need to parse the path string. Parser in parser syndrome. Or otherwise it can't handle arbitrary JSON documents.
JSON pointer escapes slashes by encoding them as "~1", and tildes are escaped by encoding them as "~0". But I agree that using an array would have made much more sense. It would also have allowed the use of integers to disambiguate array indices from object keys that happen to be numbers, without having to parse the document to be patched.
> Strengths: ... Idempotency: JSON Patch operations can be safely retried without causing unintended side effects.
So, wait, you can't add an item to an array with this (except at a predefined position)? I.e. "add" with path "/.../myarray/~" (if I've understand their notation right) isn't allowed?
I'm not sure if that's good or bad, but it's certainly surprising and could do with saying a bit more explicitly.
> This pointer identifies the name field within the user object. You can refer to an array's entries by their index (ex. /user/friends/0 is Alice). The last element in an array ca be referenced using - (ex. /user/friends/- is John). If a property name contains ~, then inside the pointer it must be escaped using ~0, or if contains / then it must be escaped using ~1.
The biggest thing on my wishlist for a system like this is a standardized syntax for choosing an item in a list by an identifying (set?) of key-value pairs. E.g. for
I'd like to be able to specify that I want to update Ceiba speciosa regardless of its index. This gets especially important if we're adding items or trying to analyze diffs of previous versions of a json item
Yeah, one option is to use a different content-type for your json-patch values and basically extend JSON Patch[1] to use JSON Path[2] instead of JSON Pointer[3].
I'm probably missing a use case here, but with the JSON Pointer spec they use feeling so "URL-y", couldn't you skip the whole meta-json syntax? So rather than doing
I feel like you could pretty sensibly implement replace/delete/append with that.
Edit: "test" and "copy" from the json patch spec are unique! So there is those, as well as doing multiple edits all at once in a sort of "transaction".
I do get that, just saw the "test" op, to either pass or fail the whole change as a sort of transaction. That is really neat.
But I just find that the 1 by 1 approach is easier to reason about if you're opening this up to the internet. I'd personally feel more comfortable with the security model of 1 URL + Session => 1 JSON key.
$WORK project heavily utilizes the test op to enable atomic updates to objects across multiple competing clients. It winds up working really well for that purpose.
I am using JSONDiffpatch made by Benjamín Eidelman some years in production now. It is perfect, works in a browser and on a node/cloudflare worker/etc.
How does JSON Patch compare to JSONDiffpatch? It is not mentioned in the alternatives list.
This basically is a procedural DSL for patching, built on JSON. Which gives me an idea.
What if the client supplied actual code to do the update? I'm thinking something sort of like ebpf in the kernel - very heavily restricted on what can be done, but still very flexible.
A challenge I experienced with JSON Patch is convincing external parties it's worth learning. I used this in a customer-facing API at $PREVJOB and had to do a lot of evangelism to get our users to understand and adopt it. Most of our customers weren't tech shops, however, so the contracted engineering staff may have had something to do with it.
Standard text patches (diffs) are great because they work with all text files but for a specific representation like JSON you can do a lot better. In terms of code volume it's a lot lighter to find a node based on a json path than applying a diff.
I used this standard a long time ago to make a simple server -> client reactive state for a card game by combining it with Vue's reactive objects to get something like a poor man's CRDTs. This is a rough diagram of what it looked like: [1]. Although there was a reducer step on the server to hide private state for the other player, it's still probably wildly impractical for actual games.
This declarative approach is always limited compared to an actual programming language (e.g. JS), and so over time things get bolted on until it’s no longer truly declarative, it’s just a poor man’s imperative language. A good example is HCL which is a mess.
So how do you validate the data? You can apply all the changes to existing record and validate the result, but then you need to put everything in memory. Verifying the operations however sounds dangerous... Any pointers?
Also, if someone is using this in production: any gotchas?
If you are using Java, you may want to check out the library I created for American Express and open sourced, unify-jdocs - it provides for working with JSON documents outside of POJOLand. For validations, it also has the concept of "typed" document using which you can create a document structure against which all read / writes will be validated. Far simpler and in my opinion as powerful as JSONSchema. https://github.com/americanexpress/unify-jdocs.
The approach I've generally seen used is that you have a set of validation that you apply to the JSON and apply that to the results of the patch operation.
You probably want to have some constraints on the kinds of patch objects you apply to avoid simple attacks (e.g. a really large patch, or overly complex patches). But you can probably come up with a set of rules to apply generally to the patch without trying to validate that the patch value for the address meets some business logic. Just do that at the end.
As I understand it, JSON works well as an interchange format, for ephemeral messages to communicate between two entities.
For JSON Patch to be useful and efficient, it requires both sides to use JSON for representation. As in, if you have some native structure that you maintain, JSON Patch either must be converted into operations on your native structure, or you have to serialize to JSON, patch, and deserialize back to the native structure. Which is not efficient, and so either you don't use JSON Patch or you have to switch to JSON as your internal representation, which is problematic in many situations. (The same arguments apply to the sending side.)
Not only that, but you become dependent on the patch to produce exactly the same result on both sides, or things can drift apart and later patches will start failing to apply, requiring resynchronization. I would probably want some sort of checksum to be communicated alongside the patch, but it would be tough to generate a checksum without materializing the full JSON structure. (If you try to update it incrementally, then you're back to the same problem of potential divergence.)
I mean, I can see the usefulness of this: it's like logical vs physical WAL. Or state- vs operation-based CRDTs. Or deltas vs snapshots. But the impact on the representation on both sides bothers me, as does the fact that you're kind of reimplementing some database functionality in the application layer.
If I were faced with this problem and didn't already represent everything as giant JSON documents (and it's unlikely that I would do that), I think I'd be tempted to use some binary format that I could apply the rsync algorithm to in order to guarantee a bit-for-bit identical copy on the other side. Basically, hand the problem off to a fully generic library that I already trust. (You don't have to pay for lots of round-trip latencies; rsync has batch updates if you already know the recipient has an exact matching copy of the previous state.) You still have to match representations on the sending and receiving side, but you can use any (pointer-free) representation you want.
> Structured data can also be stored as just structured data.
Fair point. I probably overstated the weaknesses of the JSON model; it's fine for many uses.
But I like Map, and Set, and occasionally even WeakMap. I especially like JS objects that maintain their ordering. I'm even picky enough to like BigInts to stay BigInts, RegExes to stay RegExes, and graphs to have direct links between nodes and be capable of representing cycles. So perhaps it's just the nature of problems I work on, but I find JSON to be a pretty impoverished representation within an application -- even with no OOP involved.
It's great for interchange, though.
>> I think I'd be tempted to use some binary format
> And now you require both sites to use a binary format for representation. And then you have the same list of challenges.
Requiring the same format on both sides is an important limitation, but it's less of a limitation than additionally requiring that format to be JSON. It's not the same list of challenges, it's a subset.
Honestly, I'm fond of textual representations, and avoid binary representations in general. I like debuggability. But I'm imagining an application where JSON Patch is useful, as in something where you are mutating small pieces of a large structure, and that's where I'd be more likely to reach for binary and a robust mirroring mechanism.
To the substantive comment: if your JSON is in a DB, then the DB can do whatever fancy thing it can come up with in order to replicate changes. Databases are good at that, and a place where the engineering effort is well-spent. Both sides of the JSON Patch communication can then read and write to the DB and everything will be fine -- but there'll be no need for JSON Patch, no need for any application-level diff generation and application at all.
As for working at Mozilla: oh, it's worse than that, I'm the one who did the most recent round of optimization to JSON.stringify(). So if you're using Firefox, you're even more vulnerable to my incompetence than you realized.
Furthermore, I'll note that for Mozilla, I do 90% of my work in C++, 10% in Python, and a rounding error in JS and Rust. So although I work on a JS engine, I'm dangerously clueless about real-world JS. I mostly use it for fun, and for side projects (and sometimes those coincide). If you expect JS engine developers to be experts in the proper way to use JS, then you're going to be sorely disappointed.
That said, I'd be interested to hear a counterargument. Your argument so far agreed with me.
I would love to see some (optional) checksumming or orignal value in the protocol and a little more flexibility in the root node for other metadata like format versioning etc. rather than just the array of patch ops in the root.
This is what makes Tcl great as a data interchange format. It comes with a safe mode for untrusted code and you can further restrict it to have no control flow commands to be non-Turing.
It’s delivered in JSON, but you need an interpreter. But the actions are just JS assignment statements and a little glue. Your interpreter could as easily handle that, and with far less bytes. Why call a member variable /name when it’s already .name?
> javascript's type system natively distinguishes undefined from null values.
JSON is not JavaScript (despite the "J"), and `undefined` is not a part of JSON specification.
However, I think every language out there that has an dictionary-like type can distinguish between presence of a key and absence of one, so your argument still applies. At the very least, for simple documents that don't require anything fancy.
I believe this simple merge-based approach is exactly what people are using when they don't need JSON Patch. If you operate on large arrays or need transactional updates, JSON Patch is probably be a better choice, though.
> Only trouble is static type people love their type serializers, which are ever at a mismatch with the json they work with.
I don't think it's a type system problem, unless the language doesn't have some type that is present in JSON and has to improvise. Typically, it's rather a misunderstanding that a patch document (even if it's a merge patch like your example) has its own distinct type from the entity it's supposed to patch - at the very least, in terms of nullability. A lot of people (myself included) made that blunder only to realize how it's broken later. JSON Patch avoids that because it's very explicit that patch document has its own distinct structure and types, but simple merge patches may confuse some.
I'm working on something right now that has a need to add/remove a few items to a very large array. (Not merely updating properties of an existing Object.) I ran across JSON Patch as a solution to this but ended up implementing just the part from it that I actually needed. (The "add" and "remove" operators.)
The alternative is the modify the large array on the client side and send the whole modified array every time.
I quite like JSON Patch but I've always felt that it's so convoluted only because of its goal of being able to modify every possible JSON document under the sun. If you allow yourself to restrict your data set slightly, you can patch documents much simpler.
For example, Firebase doesn't let you store null values. Instead, for Firebase, setting something to null means the same as deleting it. With a single simple restriction like that, you can implement PATCH simply by accepting a (recursive) partial object of whatever that endpoint. Eg if /books/1 has
you can add a PATCH /books/1 that takes eg and the result will be This is way simpler than JSON Patch - there's nothing new to learn, except "null means delete". IMO "nothing new to learn" is a fantastic feature for an API to have.Of course, if you can't reserve a magic value to mean "delete" then you can't do this. Also, appending things to arrays etc can't be done elegantly (but partially mutating arrays in PATCH is, I'd wager, often bad API design anyway). But it solves a very large % of the use cases JSON Patch is designed for in a, in my humble opinion, much more elegant way.
The article has a section at the bottom "Alternatives..." [1]. It links to "JSON Merge Patch" which is what you are describing: https://zuplo.com/blog/2024/10/11/what-is-json-merge-patch
That's the format that people tend to naturally use. The main problem is that arrays can only be replaced.
[1] https://zuplo.com/blog/2024/10/10/unlocking-the-power-of-jso...
Also the first thing I was thinking. The only reason I can see for using JSON Patch is for updating huge arrays. But I never really had such big arrays that I felt the necessity for something like this.
Nice! I gotta say I didn't expect a thing called "JSON Merge Patch" to be simpler and more concise than a thing called "JSON Patch" :-)
Same here, I actually made a tool for this called www.jsonmergepatch.com - give it a try
I wonder if adding a merge op would be a viable option e.g.:
It's kind of nice to retain the terse and intuitive format while also gaining features like "test" and explicit nulls. It's of course not spec compliant anymore but for standard JSON Patch APIs the client could implement a simple Merge Patch->Patch compiler.I've only use JSON Patch once as a quick hack to fix a problem I never thought I would encounter.
I had built a quick and dirty web interface so that a handful of people we contracted overseas can annotate some text data at the word level.
Originally, the plan was that the data was being annotated in small chunks (a sentence or two of text) but apparently the person managing the annotation team started assigning whole documents and we got a complaint that suddenly the annotations weren't being saved.
It turned out that the annotators had been using a dial up connection the entire time (in 2018!) and so the upload was timing out for them.
We panicked a bit until I discovered JSON Patch and I rewrote the upload code to only use the patch.
I'm working on a project using CRDTs (Yjs) to generate efficient diffs for documents. I could probably use JSON Patch, but I worry about relying on something like fast-json-patch to automatically generate patches by diffing JSON documents.
If I have json like
[{"name": "Bob", age: 22}, {"name": "Sally", age: 40}]
and then I delete the "Sally" element and add "Jenny" with the same age, I end up with
[{"name": "Bob", age: 22"}, {"name": "Jenny", age: 40}]
However, the patch would potentially look like I changed the name of "Sally" to "Jenny", when in reality I deleted and added a new element. If I'm submitting this to a server and need to reconcile it with a database, the server cares about the specifics of how the JSON was manipulated.
I'd end up needing some sort of container object (like Yjs provides) that can track the individual changes and generate a correct patch.
Just add some unique IDs to your records.
`/` is a weird choice of delimiter for JSON.
Since JSON is a subset of JS, I would have expected `.` to be the delimiter. That jives with how people think of JSON structures in code. (Python does require bracket syntax for traversing JSON, but even pandas uses dots when you generate a dataframe from JSON.)
When I see `/`, I think:
- "This spec must have been written by backend people," and
- "I wonder if there's some relative/absolute path ambiguity they're trying to solve by making all the paths URLs."
It sort of is like a URL here's some more examples with both relative and absolute queries https://opis.io/json-schema/2.x/pointers.html
Maybe we're talking about different things, but resources in REST are identified by their URL and URLs use '/' to separate elements in the path.
Yeah, but nobody ever looked at
and thought "I need the list at /a/b/c"If you've ever done XPath you do!
Yeah, wrote my own XPath-like extension methods to manipulate JSON just like that. Felt very natural and makes it quite easy to generate and process JSON for the cases serialization/deserialization isn't the best option.
Seems like a reasonable thing to me.
"I need all the models of Toyota cars."Or
"Toyota came out with a new Camry, I need to update the Camry object within Toyota's models."
Yeah, but that's makes.toyota.models, not /makes/toyota/models.
The point is that this is a data structure and not a web server. It's using a convention from one domain in a different one. Relatively minor in the scope of possible quibbles, but it's just one more thing to remember/one more trivial mistake to make. I'm sure much collective time will be lost to many people using dots where the spec says to use slashes, and not realizing it because it looks right in every other context with dots. Dots also makes copy/pasting easier, because you wouldn't have to migrate the format after pasting.
Oh I see what you mean. I misunderstood, I also don't like slash as a separator.
Anecdotal, I did and I do. It's no different than a path on a filesystem.
But it's different than object notation in JS, and considering JSON stands for JavaScript Object Notation, I think dot notation would have been more appropriate for JSON Pointer (and by extension JSON Path). As a bit of a rebel myself, I use dot notation when describing a location in a JSON document, unless I'm forced to use the slash for something like JSON Pointer.
this was one of the biggest learning curves / adjustments tbh but once I got over that it's surprisingly powerful.
It tackles like 80% of cases
Extending the 80/20 analogy, how much additional efforts does the last 20% take here? The format seems efficient enough, but I'm wondering about the complexity trade-offs one can expect.
JSON is derived from JavaScript, it is not a strict subset.
The most glaring issue is JSON number type versus JavaScript float. This causes issues in both directions whereby people consistently believe JSON can't represent numbers outside the float range and in addition JSON has no way to represent NaN.
Is there any legal JSON that's not legal JavaScript?
If not, it's fair to say it's a subset.
It is a subset as of JavaScript edition ES2019, when JavaScript strings are now allowed to contain U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR. Prior to ES2019, that was the only known example of legal JSON that was not legal JavaScript
JavaScript used to forbid U+2028 line separator and U+2029 paragraph separator in string literals, but JavaScript now allows them for compatibility with JSON.
The remaining wrinkle is different handling of "__proto__"
Why is the path a string and not an array? That means you have to have some way to escape / in keys, and also you need to parse the path string. Parser in parser syndrome. Or otherwise it can't handle arbitrary JSON documents.
Probably because JSON Patch was "influenced by" XML Patch.
JSON pointer escapes slashes by encoding them as "~1", and tildes are escaped by encoding them as "~0". But I agree that using an array would have made much more sense. It would also have allowed the use of integers to disambiguate array indices from object keys that happen to be numbers, without having to parse the document to be patched.
> Strengths: ... Idempotency: JSON Patch operations can be safely retried without causing unintended side effects.
So, wait, you can't add an item to an array with this (except at a predefined position)? I.e. "add" with path "/.../myarray/~" (if I've understand their notation right) isn't allowed?
I'm not sure if that's good or bad, but it's certainly surprising and could do with saying a bit more explicitly.
> This pointer identifies the name field within the user object. You can refer to an array's entries by their index (ex. /user/friends/0 is Alice). The last element in an array ca be referenced using - (ex. /user/friends/- is John). If a property name contains ~, then inside the pointer it must be escaped using ~0, or if contains / then it must be escaped using ~1.
The biggest thing on my wishlist for a system like this is a standardized syntax for choosing an item in a list by an identifying (set?) of key-value pairs. E.g. for
I'd like to be able to specify that I want to update Ceiba speciosa regardless of its index. This gets especially important if we're adding items or trying to analyze diffs of previous versions of a json itemQuery by content reminds me of XPath, so I looked it up to see if there was a version for JSON...
Turns out there is https://www.ietf.org/archive/id/draft-goessner-dispatch-json...
Yeah, one option is to use a different content-type for your json-patch values and basically extend JSON Patch[1] to use JSON Path[2] instead of JSON Pointer[3].
[1] https://www.rfc-editor.org/info/rfc6902
[2] https://www.rfc-editor.org/info/rfc9535
[3] https://www.rfc-editor.org/info/rfc6901
I'm probably missing a use case here, but with the JSON Pointer spec they use feeling so "URL-y", couldn't you skip the whole meta-json syntax? So rather than doing
why not I feel like you could pretty sensibly implement replace/delete/append with that.Edit: "test" and "copy" from the json patch spec are unique! So there is those, as well as doing multiple edits all at once in a sort of "transaction".
And then you'd be limited to only one change at the time and lose the benefit of making lot of changes with one request.
I do get that, just saw the "test" op, to either pass or fail the whole change as a sort of transaction. That is really neat.
But I just find that the 1 by 1 approach is easier to reason about if you're opening this up to the internet. I'd personally feel more comfortable with the security model of 1 URL + Session => 1 JSON key.
$WORK project heavily utilizes the test op to enable atomic updates to objects across multiple competing clients. It winds up working really well for that purpose.
I'm actually using it at the moment in my work and I'm often doing 3-4 updates per patch.
You want them to all fail or not,
One-by-one is a bit of a weird suggestion tbh. You shouldn't be reasoning that way about code.
If you are going to get a 4xx response to one of the 4 property updates you want them all to fail at once.
Just like anything else we use like SQL.
You might like JSON Merge Patch then - much simpler syntax that avoids the URL stuff
I am using JSONDiffpatch made by Benjamín Eidelman some years in production now. It is perfect, works in a browser and on a node/cloudflare worker/etc. How does JSON Patch compare to JSONDiffpatch? It is not mentioned in the alternatives list.
https://github.com/benjamine/jsondiffpatch
Thanks for sharing, I'll try and find time to compare and write about it
Funny enough, it makes me think about iTop's way of patching XML files: https://www.itophub.io/wiki/page?id=latest:customization:xml...
The RFC 6902 - JavaScript Object Notation (JSON) Patch standard is also used by the AWS Cloud Control API:
https://docs.aws.amazon.com/cloudcontrolapi/latest/APIRefere...
Really sucks we don’t have the JQ/JS notation style for paths in this standard I think it would have made it much more accessible
This basically is a procedural DSL for patching, built on JSON. Which gives me an idea.
What if the client supplied actual code to do the update? I'm thinking something sort of like ebpf in the kernel - very heavily restricted on what can be done, but still very flexible.
So: https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule ?
For anyone who wants to try this in Python, `yyjson`[1] supports both JSON Patch (RFC 6902) and JSON Merge-Patch (RFC 7386)[2]
[1]: https://github.com/tktech/py_yyjson [2]: https://tkte.ch/py_yyjson/api.html#yyjson.Document.patch
A challenge I experienced with JSON Patch is convincing external parties it's worth learning. I used this in a customer-facing API at $PREVJOB and had to do a lot of evangelism to get our users to understand and adopt it. Most of our customers weren't tech shops, however, so the contracted engineering staff may have had something to do with it.
Judging from the tone, is this article written by AI?
What about auto-formatting the json and sorting all keys, to create some kind of canonical form? Then we can use standard textual patch.
Standard text patches (diffs) are great because they work with all text files but for a specific representation like JSON you can do a lot better. In terms of code volume it's a lot lighter to find a node based on a json path than applying a diff.
I used this standard a long time ago to make a simple server -> client reactive state for a card game by combining it with Vue's reactive objects to get something like a poor man's CRDTs. This is a rough diagram of what it looked like: [1]. Although there was a reducer step on the server to hide private state for the other player, it's still probably wildly impractical for actual games.
[1] https://user-images.githubusercontent.com/50021387/184360079...
Meh. I get it.
It’s just so bassackwards to use an imperative, indirect form to describe state, even if it’s just state changes.
Maybe simply specify the new state?
This declarative approach is always limited compared to an actual programming language (e.g. JS), and so over time things get bolted on until it’s no longer truly declarative, it’s just a poor man’s imperative language. A good example is HCL which is a mess.
What about just stringifying a JS function?
This is essentially the RPC vs REST debate. Do you want your API to be a schema of data types (REST), or a list of function signatures (RPC)?
So how do you validate the data? You can apply all the changes to existing record and validate the result, but then you need to put everything in memory. Verifying the operations however sounds dangerous... Any pointers?
Also, if someone is using this in production: any gotchas?
If you are using Java, you may want to check out the library I created for American Express and open sourced, unify-jdocs - it provides for working with JSON documents outside of POJOLand. For validations, it also has the concept of "typed" document using which you can create a document structure against which all read / writes will be validated. Far simpler and in my opinion as powerful as JSONSchema. https://github.com/americanexpress/unify-jdocs.
The approach I've generally seen used is that you have a set of validation that you apply to the JSON and apply that to the results of the patch operation.
You probably want to have some constraints on the kinds of patch objects you apply to avoid simple attacks (e.g. a really large patch, or overly complex patches). But you can probably come up with a set of rules to apply generally to the patch without trying to validate that the patch value for the address meets some business logic. Just do that at the end.
Ooh goodie, I wonder what the next popular data format is going to be. I want to be the first to re-invent XSDs and XSLTs for that one as well!
Check out JSON Merge Patch
The pros/cons section is giving ChatGPT to me
You can also use general purpose compressors like Zstandard to create a generic patch:
JSON-patch is not a generic patch. It's not the same thing.
Reminds me of devicetree overlays :-)
As I understand it, JSON works well as an interchange format, for ephemeral messages to communicate between two entities.
For JSON Patch to be useful and efficient, it requires both sides to use JSON for representation. As in, if you have some native structure that you maintain, JSON Patch either must be converted into operations on your native structure, or you have to serialize to JSON, patch, and deserialize back to the native structure. Which is not efficient, and so either you don't use JSON Patch or you have to switch to JSON as your internal representation, which is problematic in many situations. (The same arguments apply to the sending side.)
Not only that, but you become dependent on the patch to produce exactly the same result on both sides, or things can drift apart and later patches will start failing to apply, requiring resynchronization. I would probably want some sort of checksum to be communicated alongside the patch, but it would be tough to generate a checksum without materializing the full JSON structure. (If you try to update it incrementally, then you're back to the same problem of potential divergence.)
I mean, I can see the usefulness of this: it's like logical vs physical WAL. Or state- vs operation-based CRDTs. Or deltas vs snapshots. But the impact on the representation on both sides bothers me, as does the fact that you're kind of reimplementing some database functionality in the application layer.
If I were faced with this problem and didn't already represent everything as giant JSON documents (and it's unlikely that I would do that), I think I'd be tempted to use some binary format that I could apply the rsync algorithm to in order to guarantee a bit-for-bit identical copy on the other side. Basically, hand the problem off to a fully generic library that I already trust. (You don't have to pay for lots of round-trip latencies; rsync has batch updates if you already know the recipient has an exact matching copy of the previous state.) You still have to match representations on the sending and receiving side, but you can use any (pointer-free) representation you want.
Isn't this easy though? Just don't over-use OOP. Structured data can also be stored as just structured data.
> I think I'd be tempted to use some binary format
And now you require both sites to use a binary format for representation. And then you have the same list of challenges.
> Structured data can also be stored as just structured data.
Fair point. I probably overstated the weaknesses of the JSON model; it's fine for many uses.
But I like Map, and Set, and occasionally even WeakMap. I especially like JS objects that maintain their ordering. I'm even picky enough to like BigInts to stay BigInts, RegExes to stay RegExes, and graphs to have direct links between nodes and be capable of representing cycles. So perhaps it's just the nature of problems I work on, but I find JSON to be a pretty impoverished representation within an application -- even with no OOP involved.
It's great for interchange, though.
>> I think I'd be tempted to use some binary format
> And now you require both sites to use a binary format for representation. And then you have the same list of challenges.
Requiring the same format on both sides is an important limitation, but it's less of a limitation than additionally requiring that format to be JSON. It's not the same list of challenges, it's a subset.
Honestly, I'm fond of textual representations, and avoid binary representations in general. I like debuggability. But I'm imagining an application where JSON Patch is useful, as in something where you are mutating small pieces of a large structure, and that's where I'd be more likely to reach for binary and a robust mirroring mechanism.
Weird, you work at Mozilla and ignore JSON in DBs is a thing (and has been for 15+ years).
Anyway, a few resources to help you learn:
https://firebase.google.com/docs/firestore
https://www.mongodb.com/
https://www.postgresql.org/docs/current/datatype-json.html
To the substantive comment: if your JSON is in a DB, then the DB can do whatever fancy thing it can come up with in order to replicate changes. Databases are good at that, and a place where the engineering effort is well-spent. Both sides of the JSON Patch communication can then read and write to the DB and everything will be fine -- but there'll be no need for JSON Patch, no need for any application-level diff generation and application at all.
As for working at Mozilla: oh, it's worse than that, I'm the one who did the most recent round of optimization to JSON.stringify(). So if you're using Firefox, you're even more vulnerable to my incompetence than you realized.
Furthermore, I'll note that for Mozilla, I do 90% of my work in C++, 10% in Python, and a rounding error in JS and Rust. So although I work on a JS engine, I'm dangerously clueless about real-world JS. I mostly use it for fun, and for side projects (and sometimes those coincide). If you expect JS engine developers to be experts in the proper way to use JS, then you're going to be sorely disappointed.
That said, I'd be interested to hear a counterargument. Your argument so far agreed with me.
The point is, 99% of cases where JSON is used, it is already:
* Agreed by both parties to be the protocol they'll use
* Used for "representation" (assuming we mean the same by that, if not, please clarify)
>So if you're using Firefox
I jumped ship ten years ago; but I've heard you guys are doing quite well?
>I'm dangerously clueless about real-world JS.
Agree.
Disclosure: I'm @moralestapia but my laptop ran out of battery and this is my backup account, lol.
I would love to see some (optional) checksumming or orignal value in the protocol and a little more flexibility in the root node for other metadata like format versioning etc. rather than just the array of patch ops in the root.
```
{ "checksum": { "algorithm": "sha1", "normalization": "minify", "root-checksum": "d930e659007308ac8090182fe664c7f64e898ed9" }, "patch": [ { "op": "replace", "path": "/id", "node-checksum": "b11ee5e59dc833a22b5f0802deb99c29fb50fdd0", "value": { "foo": "bar", "nullptr": 0 } }, { "op": "replace", "path": "/cat", "original-value": "foo" "value": "bar" } ] }
```
This is what "op": "test" is for. You can use it at the beginning of a patch to verify that the server's object hasn't drifted from your own.
What’s nice about JSON is that it’s actually valid JavaScript, with some formal specification to avoid any nasty circles or injections.
Why can’t your protocol just be valid JavaScript too? this.name = “string”; instead of mixing so many metaphors?
Because that would require consumers to have a Javascript interpreter to use it.
Because that would require consumers to have an interpreter for the most widely deployed language, ever, and by far.
FTFY
security nightmare; sometimes you don't want consumers to execute code arbitrarily
This is what makes Tcl great as a data interchange format. It comes with a safe mode for untrusted code and you can further restrict it to have no control flow commands to be non-Turing.
Not true. Google, Meta, ... do it at a massive scale, no issues.
It's not really hard to protect yourself against that.
Any (competent) security guy can give you like 4 ways to implement it properly.
Do you mean the ads they serve that contain malware?
Nah, in that case Python would be a better option as it's already installed everywhere.
That is so derangedly untrue.
Starlark is a nice embeddable scripting language, though. Java, Go, and Rust implementations: https://github.com/bazelbuild/starlark/blob/master/users.md#...
But what's your point? Would you truly want consumers of JSON Patch data to embed a JS interpreter?
My point is that the JS interpreter is likely already there.
> Why can’t your protocol just be valid JavaScript too?
It is.
It’s delivered in JSON, but you need an interpreter. But the actions are just JS assignment statements and a little glue. Your interpreter could as easily handle that, and with far less bytes. Why call a member variable /name when it’s already .name?
Never liked it. Ignores the wonderful fact that javascript's type system natively distinguishes undefined from null values.
{ "name": "bob", "phone" null }
This would set the name to bob, null out the phone, but leave all other fields untouched. No need for a DSL-over-json.
Only trouble is static type people love their type serializers, which are ever at a mismatch with the json they work with.
> javascript's type system natively distinguishes undefined from null values.
JSON is not JavaScript (despite the "J"), and `undefined` is not a part of JSON specification.
However, I think every language out there that has an dictionary-like type can distinguish between presence of a key and absence of one, so your argument still applies. At the very least, for simple documents that don't require anything fancy.
I believe this simple merge-based approach is exactly what people are using when they don't need JSON Patch. If you operate on large arrays or need transactional updates, JSON Patch is probably be a better choice, though.
> Only trouble is static type people love their type serializers, which are ever at a mismatch with the json they work with.
I don't think it's a type system problem, unless the language doesn't have some type that is present in JSON and has to improvise. Typically, it's rather a misunderstanding that a patch document (even if it's a merge patch like your example) has its own distinct type from the entity it's supposed to patch - at the very least, in terms of nullability. A lot of people (myself included) made that blunder only to realize how it's broken later. JSON Patch avoids that because it's very explicit that patch document has its own distinct structure and types, but simple merge patches may confuse some.
I'm working on something right now that has a need to add/remove a few items to a very large array. (Not merely updating properties of an existing Object.) I ran across JSON Patch as a solution to this but ended up implementing just the part from it that I actually needed. (The "add" and "remove" operators.)
The alternative is the modify the large array on the client side and send the whole modified array every time.
You're looking for JSON merge patch which they briefly mentioned https://www.rfc-editor.org/rfc/rfc7386