Hosted ondailyplanet.iovia theHypermedia Protocol

Text Fragment Rendering

    Text fragment rendering displays specific portions of document text (defined by character ranges) with all inline embeds resolved to their actual content.

    Overview

      When embedding a document fragment with a range (e.g., hm://account/doc#blockId[10:50]), the system needs to:

        Extract characters 10-50 from the block's text

        Resolve any inline embeds within that range

        Display the resulting text with embed names

    The Challenge

      Fragment ranges are specified using unicode code point positions in the original text, which:

        Counts actual text characters

        Excludes invisible inline embed markers (U+FEFF)

        Uses unicode code points (not UTF-16 code units)

      But we want to display text where:

        Inline embeds are replaced with document names

        Character positions map correctly to the original range

    Solution: FragmentText Component

      Component Location

        frontend/packages/ui/src/document-content.tsx

      How It Works

        function FragmentText({
          documentId,
          blockRef,
          start,
          end,
        }: {
          documentId: UnpackedHypermediaId
          blockRef: string
          start: number
          end: number
        })
        

        Process:

          Fetch Full Text: Call getDocumentText with the blockRef and resolveInlineEmbeds: true

            getDocumentText(
              {...documentId, blockRef, blockRange: null},
              {lineBreaks: false, resolveInlineEmbeds: true}
            )
            

          Extract Fragment: Use Array.from() to properly handle unicode code points

            const codePoints = Array.from(fullText)
            const fragment = codePoints.slice(start, end).join('')
            

          Display: Render the extracted text

            <Text className="whitespace-pre-wrap">{fragment}</Text>
            

    Integration with ContentEmbed

      The ContentEmbed component detects text fragments and renders them appropriately:

      // Check if this is a text fragment (blockRef with start/end range)
      const isTextFragment =
        props.blockRef &&
        props.blockRange &&
        'start' in props.blockRange &&
        'end' in props.blockRange
      
      if (isTextFragment && props.blockRef && props.blockRange &&
          'start' in props.blockRange && 'end' in props.blockRange) {
        // Render as plain text with resolved embeds
        content = (
          <FragmentText
            documentId={narrowHmId(props)}
            blockRef={props.blockRef}
            start={props.blockRange.start}
            end={props.blockRange.end}
          />
        )
      } else {
        // Normal block rendering
        // ...
      }
      

    Example Scenarios

      Example 1: Simple Text Fragment

        Original block text:

        "Hello world, this is a test paragraph with some content."
        

        Fragment: #blockId[0:11]

        Result: "Hello world"

      Example 2: Fragment with Inline Embed

        Original block text (with invisible markers):

        "Check out \uFEFF post about AI!"
        // Position: 0-9, [10 = embed], 11-25
        

        With inline embed resolved:

        "Check out @Alice's Guide post about AI!"
        

        Fragment: #blockId[0:20]

        Result: "Check out @Alice's Guide pos" (first 20 unicode code points)

      Example 3: Multiple Inline Embeds

        Original text:

        "Read \uFEFF and \uFEFF for more info"
        // [Read ][embed1][ and ][embed2][ for more info]
        

        With embeds resolved:

        "Read @Getting Started and @Advanced Topics for more info"
        

        Fragment: #blockId[0:25]

        Result: First 25 code points with both embed names included

    Character Position Mapping

      Key Concepts

        Original Positions: Defined in the blockRange, count actual text excluding embed markers

        Resolved Text: After documentToText processes it, embeds become their document names

        Unicode Code Points: Use Array.from() to properly count multi-byte characters

      Why Array.from()?

        JavaScript strings are UTF-16 encoded. Emojis and special characters may use multiple UTF-16 code units:

        // Wrong: UTF-16 code units
        "Hello 👋".length // 7 (emoji uses 2 code units)
        
        // Correct: Unicode code points
        Array.from("Hello 👋").length // 6 (emoji is 1 code point)
        

    API Endpoint Support

      Fragment rendering works seamlessly in both desktop and web:

      Desktop

        Direct grpcClient access

        Synchronous document fetching

        Immediate text resolution

      Web

        Server-side API: /hm/api/document-text

        Accepts blockRef in URL parameters

        Returns resolved text via JSON

    Component States

      Loading

        if (loading) {
          return (
            <div className="flex items-center justify-center p-2">
              <Spinner />
            </div>
          )
        }
        

      Error

        if (error) {
          return <ErrorBlock message={`Failed to load fragment: ${error}`} />
        }
        

      Success

        return (
          <Text className="whitespace-pre-wrap">
            {text}
          </Text>
        )
        

    Usage in Embeds

      When a user creates an embed with a text range:

        Editor: User selects text range in a block

        Link Creation: System creates link like hm://account/doc#blockId[10:50]

        Rendering:

          ContentEmbed detects the range

          FragmentText fetches and extracts text

          Displays resolved fragment

    Performance Considerations

      Caching

        Consider caching getDocumentText results

        Fragment extraction is fast (O(n) where n = text length)

        Network requests may be slow on web

      Optimization

        Only fetch when fragment changes

        UseEffect dependencies include all ID components

        Loading state prevents UI jank

      Dependencies

        useEffect(() => {
          // ...
        }, [
          getDocumentText,
          documentId.uid,
          documentId.path?.join('/'),
          documentId.version,
          blockRef,
          start,
          end
        ])
        

    Testing

      Test scenarios to verify:

        Basic fragment extraction: Simple text without embeds

        Single inline embed: Fragment includes embed name

        Multiple inline embeds: All embeds resolved correctly

        Unicode handling: Emojis and special characters

        Boundary cases: start=0, end=text.length

        Error handling: Missing blocks, network errors

    Related Files

      frontend/packages/ui/src/document-content.tsx - FragmentText component

      frontend/packages/shared/src/document-to-text.ts - Text resolution

      frontend/packages/shared/src/document-content-types.ts - Type definitions

      frontend/apps/web/app/routes/hm.api.document-text.tsx - Web API