In practice you should always normalize your Unicode data, then all you need to do is memcmp + boundary check.
Interestingly enough this library doesn't provide grapheme cluster tokenization and/or boundary checking which is one of the most useful primitive for this.
That’s not practical in many situations, as the normalization alone may very well be more expensive than the search.
If you’re in control of all data representations in your entire stack, then yes of course, but that’s hardly ever the case and different tradeoffs are made at different times (eg storage in UTF-8 because of efficiency, but in-memory representation in UTF-32 because of speed).
From a German user perspective, ICU and your fancy library are incorrect, actually. Mass is not a different casing of Maß, they are different characters. Google likely changed this because it didn't do what users wanted.
In practice you should always normalize your Unicode data, then all you need to do is memcmp + boundary check.
Interestingly enough this library doesn't provide grapheme cluster tokenization and/or boundary checking which is one of the most useful primitive for this.
That’s not practical in many situations, as the normalization alone may very well be more expensive than the search.
If you’re in control of all data representations in your entire stack, then yes of course, but that’s hardly ever the case and different tradeoffs are made at different times (eg storage in UTF-8 because of efficiency, but in-memory representation in UTF-32 because of speed).
That doesn't make sense; the search is doing on-the-fly normalization as part of its algorithm, so it cannot be faster than normalization alone.
In practice the data is not always yours to normalize. You're not going to case-fold your library, but you still want to be able to search it.
From a German user perspective, ICU and your fancy library are incorrect, actually. Mass is not a different casing of Maß, they are different characters. Google likely changed this because it didn't do what users wanted.
The confusion likely stems from the relatively new introduction of the capitalized ẞ https://de.wikipedia.org/wiki/Gro%C3%9Fes_%C3%9F
Maß capitalized (used to be) MASS.
Funnily enough, Mass means one liter beer (think Oktoberfest).