instance method unpack_graphemes

Ruby on Rails 5.2.8.1

Since v4.0.13 Last seen in v6.0.6

Available in: v4.0.13 v4.1.16 v4.2.9 v5.2.8.1 v6.0.6

Signature

unpack_graphemes(string)

Unpack the string at grapheme boundaries. Returns a list of character lists.

Unicode.unpack_graphemes('क्षि') # => [[2325, 2381], [2359], [2367]]
Unicode.unpack_graphemes('Café') # => [[67], [97], [102], [233]]

Parameters

string req
Source
# File activesupport/lib/active_support/multibyte/unicode.rb, line 51
      def unpack_graphemes(string)
        codepoints = string.codepoints.to_a
        unpacked = []
        pos = 0
        marker = 0
        eoc = codepoints.length
        while (pos < eoc)
          pos += 1
          previous = codepoints[pos - 1]
          current = codepoints[pos]

          # See http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules
          should_break =
            if pos == eoc
              true
            # GB3. CR X LF
            elsif previous == database.boundary[:cr] && current == database.boundary[:lf]
              false
            # GB4. (Control|CR|LF) ÷
            elsif previous && in_char_class?(previous, [:control, :cr, :lf])
              true
            # GB5. ÷ (Control|CR|LF)
            elsif in_char_class?(current, [:control, :cr, :lf])
              true
            # GB6. L X (L|V|LV|LVT)
            elsif database.boundary[:l] === previous && in_char_class?(current, [:l, :v, :lv, :lvt])
              false
            # GB7. (LV|V) X (V|T)
            elsif in_char_class?(previous, [:lv, :v]) && in_char_class?(current, [:v, :t])
              false
            # GB8. (LVT|T) X (T)
            elsif in_char_class?(previous, [:lvt, :t]) && database.boundary[:t] === current
              false
            # GB9. X (Extend | ZWJ)
            elsif in_char_class?(current, [:extend, :zwj])
              false
            # GB9a. X SpacingMark
            elsif database.boundary[:spacingmark] === current
              false
            # GB9b. Prepend X
            elsif database.boundary[:prepend] === previous
              false
            # GB10. (E_Base | EBG) Extend* X E_Modifier
            elsif (marker...pos).any? { |i| in_char_class?(codepoints[i], [:e_base, :e_base_gaz]) && codepoints[i + 1...pos].all? { |c| database.boundary[:extend] === c } } && database.boundary[:e_modifier] === current
              false
            # GB11. ZWJ X (Glue_After_Zwj | EBG)
            elsif database.boundary[:zwj] === previous && in_char_class?(current, [:glue_after_zwj, :e_base_gaz])
              false
            # GB12. ^ (RI RI)* RI X RI
            # GB13. [^RI] (RI RI)* RI X RI
            elsif codepoints[marker..pos].all? { |c| database.boundary[:regional_indicator] === c } && codepoints[marker..pos].count { |c| database.boundary[:regional_indicator] === c }.even?
              false
            # GB999. Any ÷ Any
            else
              true
            end

          if should_break
            unpacked << codepoints[marker..pos - 1]
            marker = pos
          end
        end
        unpacked
      end

Defined in activesupport/lib/active_support/multibyte/unicode.rb line 51 · View on GitHub · Improve this page · Find usages on GitHub

Defined in ActiveSupport::Multibyte::Unicode

Type at least 2 characters to search.

↑↓ navigate · open · esc close