instance method
unpack_graphemes
Ruby on Rails 5.2.8.1
Since v4.0.13 Last seen in v6.0.6Signature
unpack_graphemes(string)
Unpack the string at grapheme boundaries. Returns a list of character lists.
Unicode.unpack_graphemes('क्षि') # => [[2325, 2381], [2359], [2367]] Unicode.unpack_graphemes('Café') # => [[67], [97], [102], [233]]
Parameters
-
stringreq
Source
# File activesupport/lib/active_support/multibyte/unicode.rb, line 51
def unpack_graphemes(string)
codepoints = string.codepoints.to_a
unpacked = []
pos = 0
marker = 0
eoc = codepoints.length
while (pos < eoc)
pos += 1
previous = codepoints[pos - 1]
current = codepoints[pos]
# See http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules
should_break =
if pos == eoc
true
# GB3. CR X LF
elsif previous == database.boundary[:cr] && current == database.boundary[:lf]
false
# GB4. (Control|CR|LF) ÷
elsif previous && in_char_class?(previous, [:control, :cr, :lf])
true
# GB5. ÷ (Control|CR|LF)
elsif in_char_class?(current, [:control, :cr, :lf])
true
# GB6. L X (L|V|LV|LVT)
elsif database.boundary[:l] === previous && in_char_class?(current, [:l, :v, :lv, :lvt])
false
# GB7. (LV|V) X (V|T)
elsif in_char_class?(previous, [:lv, :v]) && in_char_class?(current, [:v, :t])
false
# GB8. (LVT|T) X (T)
elsif in_char_class?(previous, [:lvt, :t]) && database.boundary[:t] === current
false
# GB9. X (Extend | ZWJ)
elsif in_char_class?(current, [:extend, :zwj])
false
# GB9a. X SpacingMark
elsif database.boundary[:spacingmark] === current
false
# GB9b. Prepend X
elsif database.boundary[:prepend] === previous
false
# GB10. (E_Base | EBG) Extend* X E_Modifier
elsif (marker...pos).any? { |i| in_char_class?(codepoints[i], [:e_base, :e_base_gaz]) && codepoints[i + 1...pos].all? { |c| database.boundary[:extend] === c } } && database.boundary[:e_modifier] === current
false
# GB11. ZWJ X (Glue_After_Zwj | EBG)
elsif database.boundary[:zwj] === previous && in_char_class?(current, [:glue_after_zwj, :e_base_gaz])
false
# GB12. ^ (RI RI)* RI X RI
# GB13. [^RI] (RI RI)* RI X RI
elsif codepoints[marker..pos].all? { |c| database.boundary[:regional_indicator] === c } && codepoints[marker..pos].count { |c| database.boundary[:regional_indicator] === c }.even?
false
# GB999. Any ÷ Any
else
true
end
if should_break
unpacked << codepoints[marker..pos - 1]
marker = pos
end
end
unpacked
end
Defined in activesupport/lib/active_support/multibyte/unicode.rb line 51
· View on GitHub
· Improve this page
· Find usages on GitHub
Defined in ActiveSupport::Multibyte::Unicode