this research paper says that the basis of the collection is 2019 version law from fifty US states. The descriptions of why and how are useful here, but this 2019 date implies that a more recent collection, or a very recent collection, is not available to the public under a broad license. The texts they build from are specifically owned by the public, not a research group or spinoff company. Historically, compendiums, analysis and indexing is real work that costs real money, usually in protected print format.
Is it too much to ask for, that any link to code or data might be included somewhere in this pre-print research paper?
There have been updates since 2019, most states publish an updated code annually.
I wonder if there are licensing issues w/ that, since it's a scrape of Justia. US law falls into a slightly complicated legal situation where the law is public domain, but various organizational bits can be copyrighted (see Georgia v. Public.Resource.Org, Inc.). And a number of states' official copies are on Lexis who has very aggressive scraping protection.
A different scrape of Justia is included in at least one of the big ML legal corpuses, so it's already in use commercially and they're not publicly suing people left and right over it.
I've been thinking of spooling up an openstates-esque project to scrape all 50 into a common format via the original source documents, but it's a big undertaking when most people who don't have strict ethical/legal concerns just scrape justia or casetext.
A lot of / some of this is rooted in the fact that state legislatures pass acts, but we want to deal with codes.
An act might change the law in several different “places.” Maybe there’s a criminal statute, a modification to a sentencing law, and a new addition to a statute of limitations. If you look at the -“act, you see all these changes in one block of text, in sort of a different-like format.
But when you look at the code you realize the statute of limitations is hundreds of pages away from the criminal statute, which are in different titles or articles or what-have-you. What happened?
Well, someone painstakingly started with the first act passed by the legislature and applied every additional and change set forth in each act, keeping a numbering system intact, until they got to the law today. Their work is copyrighted, even if the acts themselves weren’t.
I would not take every word of this comment as gospel in every state, but nevertheless I hope it promotes either understanding or further discussion.
(To be clear I believe parent understands all my thoughts already and am not trying to convince that author).
In Minnesota, the Office of the Revisor of Statutes does this. https://www.revisor.mn.gov/statutes/. I have to imagine most states have a similar function. The point being that it is a public agency and its work product should be accessible to the public.
this research paper says that the basis of the collection is 2019 version law from fifty US states. The descriptions of why and how are useful here, but this 2019 date implies that a more recent collection, or a very recent collection, is not available to the public under a broad license. The texts they build from are specifically owned by the public, not a research group or spinoff company. Historically, compendiums, analysis and indexing is real work that costs real money, usually in protected print format.
Is it too much to ask for, that any link to code or data might be included somewhere in this pre-print research paper?
There have been updates since 2019, most states publish an updated code annually.
I wonder if there are licensing issues w/ that, since it's a scrape of Justia. US law falls into a slightly complicated legal situation where the law is public domain, but various organizational bits can be copyrighted (see Georgia v. Public.Resource.Org, Inc.). And a number of states' official copies are on Lexis who has very aggressive scraping protection.
A different scrape of Justia is included in at least one of the big ML legal corpuses, so it's already in use commercially and they're not publicly suing people left and right over it.
I've been thinking of spooling up an openstates-esque project to scrape all 50 into a common format via the original source documents, but it's a big undertaking when most people who don't have strict ethical/legal concerns just scrape justia or casetext.
A lot of / some of this is rooted in the fact that state legislatures pass acts, but we want to deal with codes.
An act might change the law in several different “places.” Maybe there’s a criminal statute, a modification to a sentencing law, and a new addition to a statute of limitations. If you look at the -“act, you see all these changes in one block of text, in sort of a different-like format.
But when you look at the code you realize the statute of limitations is hundreds of pages away from the criminal statute, which are in different titles or articles or what-have-you. What happened?
Well, someone painstakingly started with the first act passed by the legislature and applied every additional and change set forth in each act, keeping a numbering system intact, until they got to the law today. Their work is copyrighted, even if the acts themselves weren’t.
I would not take every word of this comment as gospel in every state, but nevertheless I hope it promotes either understanding or further discussion.
(To be clear I believe parent understands all my thoughts already and am not trying to convince that author).
In Minnesota, the Office of the Revisor of Statutes does this. https://www.revisor.mn.gov/statutes/. I have to imagine most states have a similar function. The point being that it is a public agency and its work product should be accessible to the public.