Strings and Characters Swift



문자열과 캐릭터


A string is a series of characters, such as "hello, world" or "albatross". Swift strings are represented by the String type. The contents of a String can be accessed in various ways, including as a collection of Character values.

Swift’s String and Character types provide a fast, Unicode-compliant way to work with text in your code. The syntax for string creation and manipulation is lightweight and readable, with a string literal syntax that is similar to C. String concatenation is as simple as combining two strings with the + operator, and string mutability is managed by choosing between a constant or a variable, just like any other value in Swift. You can also use strings to insert constants, variables, literals, and expressions into longer strings, in a process known as string interpolation. This makes it easy to create custom string values for display, storage, and printing.

Despite this simplicity of syntax, Swift’s String type is a fast, modern string implementation. Every string is composed of encoding-independent Unicode characters, and provides support for accessing those characters in various Unicode representations.

일러두기)
스위프트의 문자열 형식은 파운데이션의 NSString 클래스와 브릿지되어 있다. 파운데이션은 또한 NSString으로 정의된 확장 메소드로 String을 확장한다. 이 의미는 만약 파운데이션을 임포트하면 캐스팅없이 String에서 NSString 메소드에 접근할 수 있다는 뜻이다.

For more information about using String with Foundation and Cocoa, see Bridging Between String and NSString.

스트링 리터럴

You can include predefined String values within your code as string literals. A string literal is a sequence of characters surrounded by double quotation marks (").

Use a string literal as an initial value for a constant or variable:

let someString = "Some string literal value"
Note that Swift infers a type of String for the someString constant because it’s initialized with a string literal value.

멀티라인 스트링 리터럴

If you need a string that spans several lines, use a multiline string literal—a sequence of characters surrounded by three double quotation marks:

let quotation = """
The White Rabbit put on his spectacles.  "Where shall I begin,
please your Majesty?" he asked.

"Begin at the beginning," the King said gravely, "and go on
till you come to the end; then stop."
"""
A multiline string literal includes all of the lines between its opening and closing quotation marks. The string begins on the first line after the opening quotation marks (""") and ends on the line before the closing quotation marks, which means that neither of the strings below start or end with a line break:

let singleLineString = "These are the same."
let multilineString = """
These are the same.
"""
When your source code includes a line break inside of a multiline string literal, that line break also appears in the string’s value. If you want to use line breaks to make your source code easier to read, but you don’t want the line breaks to be part of the string’s value, write a backslash (\) at the end of those lines:

let softWrappedQuotation = """
The White Rabbit put on his spectacles.  "Where shall I begin, \
please your Majesty?" he asked.

"Begin at the beginning," the King said gravely, "and go on \
till you come to the end; then stop."
"""
To make a multiline string literal that begins or ends with a line feed, write a blank line as the first or last line. For example:

let lineBreaks = """

This string starts with a line break.
It also ends with a line break.

"""
A multiline string can be indented to match the surrounding code. The whitespace before the closing quotation marks (""") tells Swift what whitespace to ignore before all of the other lines. However, if you write whitespace at the beginning of a line in addition to what’s before the closing quotation marks, that whitespace is included.

../_images/multilineStringWhitespace_2x.png

In the example above, even though the entire multiline string literal is indented, the first and last lines in the string don’t begin with any whitespace. The middle line has more indentation than the closing quotation marks, so it starts with that extra four-space indentation.

스트링 리터널 내의 특수문자

스트링 리터럴은 다음 특수문자를 포함할 수 있다.

- \0 널 문자, \\ 백슬래쉬, \t 수평 탭, \n 라인피드, \r 캐리지 리턴, \" 더블 쿼테이션, \' 싱글 쿼테이션
- 단항 유니코드 스칼라 값, \u{n} 으로 씀, n은 1-8 디지트 헥사 넘버 (Unicode is discussed in Unicode below)

The code below shows four examples of these special characters. The wiseWords constant contains two escaped double quotation marks. The dollarSign, blackHeart, and sparklingHeart constants demonstrate the Unicode scalar format:

let wiseWords = "\"Imagination is more important than knowledge\" - Einstein"
// "Imagination is more important than knowledge" - Einstein
let dollarSign = "\u{24}"        // $,  Unicode scalar U+0024
let blackHeart = "\u{2665}"      // ♥,  Unicode scalar U+2665
let sparklingHeart = "\u{1F496}" // 💖, Unicode scalar U+1F496
Because multiline string literals use three double quotation marks instead of just one, you can include a double quotation mark (") inside of a multiline string literal without escaping it. To include the text """ in a multiline string, escape at least one of the quotation marks. For example:

let threeDoubleQuotationMarks = """
Escaping the first quotation mark \"""
Escaping all three quotation marks \"\"\"
"""

빈 스트링 초기화 하기

To create an empty String value as the starting point for building a longer string, either assign an empty string literal to a variable, or initialize a new String instance with initializer syntax:

var emptyString = ""               // empty string literal
var anotherEmptyString = String()  // initializer syntax
// these two strings are both empty, and are equivalent to each other
Find out whether a String value is empty by checking its Boolean isEmpty property:

if emptyString.isEmpty {
    print("Nothing to see here")
}
// Prints "Nothing to see here"

스트링 뮤터빌리티

특정 문자열이 수정가능한지, 또는 상수여야 하는지를 지정한다.

var variableString = "Horse"
variableString += " and carriage"
// variableString is now "Horse and carriage"

let constantString = "Highlander"
constantString += " and another Highlander"
// this reports a compile-time error - a constant string cannot be modified

일러두기)
이 접근은 오브젝티브C와 코코아에서의 NSString 과 NSMutableString 두 클래스 사이에서 선택하는 것과 다른 접근이다.

스트링은 값 형식

스위프트의 스트링 형식은 값형식이다. 만약 새로운 스트링 값을 생성한다면 함수나 메소드에 전달하거나 상수나 변수에 할당하면 복사된다. 각 상황에서 존재하는 스트링이 복사된다. Value types are described in Structures and Enumerations Are Value Types.

스위프트의 기본이 복사에 의한 스트링은 이 것이 어디에서 왔는지 상관하지 않으므로 명확하다. 

하부에서 스위프트 컴파일러는 문자열 사용을 옵티마이즈 하므로 복사는 실제로 필요할 때만 수행된다. 이 의미는 스트링이 값 형식으로 작동하는동안 높은 성능을 얻게 한다.

캐릭터와 함께 작업

You can access the individual Character values for a String by iterating over the string with a for-in loop:

for character in "Dog!🐶" {
    print(character)
}
// D
// o
// g
// !
// 🐶
The for-in loop is described in For-In Loops.

Alternatively, you can create a stand-alone Character constant or variable from a single-character string literal by providing a Character type annotation:

let exclamationMark: Character = "!"
String values can be constructed by passing an array of Character values as an argument to its initializer:

let catCharacters: [Character] = ["C", "a", "t", "!", "🐱"]
let catString = String(catCharacters)
print(catString)
// Prints "Cat!🐱"


문자열과 캐릭터 합치기

문자열 값은 새로운 String 값을 생성하기 위해 + 연산자를 통해 합쳐질 수 있다.

let string1 = "hello"
let string2 = " there"
var welcome = string1 + string2
// welcome now equals "hello there"
You can also append a String value to an existing String variable with the addition assignment operator (+=):

var instruction = "look over"
instruction += string2
// instruction now equals "look over there"
You can append a Character value to a String variable with the String type’s append() method:

let exclamationMark: Character = "!"
welcome.append(exclamationMark)
// welcome now equals "hello there!"
NOTE

You can’t append a String or Character to an existing Character variable, because a Character value must contain a single character only.

If you’re using multiline string literals to build up the lines of a longer string, you want every line in the string to end with a line break, including the last line. For example:

let badStart = """
one
two
"""
let end = """
three
"""
print(badStart + end)
// Prints two lines:
// one
// twothree

let goodStart = """
one
two

"""
print(goodStart + end)
// Prints three lines:
// one
// two
// three
In the code above, concatenating badStart with end produces a two-line string, which isn’t the desired result. Because the last line of badStart doesn’t end with a line break, that line gets combined with the first line of end. In contrast, both lines of goodStart end with a line break, so when it’s combined with end the result has three lines, as expected.

스트링 인터폴레이션

String interpolation is a way to construct a new String value from a mix of constants, variables, literals, and expressions by including their values inside a string literal. You can use string interpolation in both single-line and multiline string literals. Each item that you insert into the string literal is wrapped in a pair of parentheses, prefixed by a backslash (\):

let multiplier = 3
let message = "\(multiplier) times 2.5 is \(Double(multiplier) * 2.5)"
// message is "3 times 2.5 is 7.5"
In the example above, the value of multiplier is inserted into a string literal as \(multiplier). This placeholder is replaced with the actual value of multiplier when the string interpolation is evaluated to create an actual string.

The value of multiplier is also part of a larger expression later in the string. This expression calculates the value of Double(multiplier) * 2.5 and inserts the result (7.5) into the string. In this case, the expression is written as \(Double(multiplier) * 2.5) when it’s included inside the string literal.

일러두기)
The expressions you write inside parentheses within an interpolated string can’t contain an unescaped backslash (\), a carriage return, or a line feed. However, they can contain other string literals.

유니코드

Unicode is an international standard for encoding, representing, and processing text in different writing systems. It enables you to represent almost any character from any language in a standardized form, and to read and write those characters to and from an external source such as a text file or web page. Swift’s String and Character types are fully Unicode-compliant, as described in this section.

Unicode Scalar Values
Behind the scenes, Swift’s native String type is built from Unicode scalar values. A Unicode scalar value is a unique 21-bit number for a character or modifier, such as U+0061 for LATIN SMALL LETTER A ("a"), or U+1F425 for FRONT-FACING BABY CHICK ("🐥").

Note that not all 21-bit Unicode scalar values are assigned to a character—some scalars are reserved for future assignment or for use in UTF-16 encoding. Scalar values that have been assigned to a character typically also have a name, such as LATIN SMALL LETTER A and FRONT-FACING BABY CHICK in the examples above.

Extended Grapheme Clusters
Every instance of Swift’s Character type represents a single extended grapheme cluster. An extended grapheme cluster is a sequence of one or more Unicode scalars that (when combined) produce a single human-readable character.

Here’s an example. The letter é can be represented as the single Unicode scalar é (LATIN SMALL LETTER E WITH ACUTE, or U+00E9). However, the same letter can also be represented as a pair of scalars—a standard letter e (LATIN SMALL LETTER E, or U+0065), followed by the COMBINING ACUTE ACCENT scalar (U+0301). The COMBINING ACUTE ACCENT scalar is graphically applied to the scalar that precedes it, turning an e into an é when it’s rendered by a Unicode-aware text-rendering system.

In both cases, the letter é is represented as a single Swift Character value that represents an extended grapheme cluster. In the first case, the cluster contains a single scalar; in the second case, it’s a cluster of two scalars:

let eAcute: Character = "\u{E9}"                         // é
let combinedEAcute: Character = "\u{65}\u{301}"          // e followed by ́
// eAcute is é, combinedEAcute is é
Extended grapheme clusters are a flexible way to represent many complex script characters as a single Character value. For example, Hangul syllables from the Korean alphabet can be represented as either a precomposed or decomposed sequence. Both of these representations qualify as a single Character value in Swift:

let precomposed: Character = "\u{D55C}"                  // 한
let decomposed: Character = "\u{1112}\u{1161}\u{11AB}"   // ᄒ, ᅡ, ᆫ
// precomposed is 한, decomposed is 한
Extended grapheme clusters enable scalars for enclosing marks (such as COMBINING ENCLOSING CIRCLE, or U+20DD) to enclose other Unicode scalars as part of a single Character value:

let enclosedEAcute: Character = "\u{E9}\u{20DD}"
// enclosedEAcute is é⃝
Unicode scalars for regional indicator symbols can be combined in pairs to make a single Character value, such as this combination of REGIONAL INDICATOR SYMBOL LETTER U (U+1F1FA) and REGIONAL INDICATOR SYMBOL LETTER S (U+1F1F8):

let regionalIndicatorForUS: Character = "\u{1F1FA}\u{1F1F8}"
// regionalIndicatorForUS is 🇺🇸

문자 카운팅

문자열에서 캐릭터의 갯수를 세려면 문자열의 count 속성을 사용한다.

let unusualMenagerie = "Koala 🐨, Snail 🐌, Penguin 🐧, Dromedary 🐪"
print("unusualMenagerie has \(unusualMenagerie.count) characters")
// Prints "unusualMenagerie has 40 characters"

스위프트의 캐릭터 값에 대한 확장된 그래핌 클러스터의 사용이 문자열 합치기와 수정이 항상 문자열의 캐릭터 갯수에 영향을 끼치지는 않다는 것을 기억해야 한다.

예를 들어, 만약 새로운 문자열을 초기화하고 cafe 라는 4문자를 할당하고 뒤에  COMBINING ACUTE ACCENT (U+0301)  를 합치면 결과 문자 갯수는 여전히 4이다. with a fourth character of é, not e:

var word = "cafe"
print("the number of characters in \(word) is \(word.count)")
// Prints "the number of characters in cafe is 4"

word += "\u{301}"    // COMBINING ACUTE ACCENT, U+0301

print("the number of characters in \(word) is \(word.count)")
// Prints "the number of characters in café is 4"

일러두기)
확장된 그래핌 클러스터는 다중 유니코드 스칼라로 결합될 수 있다. 이 의미는 다른 문자들 - 같은 문자이나 다른 식으로 표현된 - 은 저장하기 위한 메모리가 다를 수도 있다는 의미이다. 이 때문에 스위프트의 캐릭터는 문자열 표현에서 메모리의 양이 항상 같은 양만이 할당되는게 아니다. 결과적으로 스트링의 갯수는 반복하지 않고는 계산될 수 없다. 긴 문자열과 작업한다면 count 프로퍼티가 전체를 반복함을 기억해야 한다. 

문자열의 갯수는 count 속성이 항상 NSString 의 length속성과 같지는 않음을 기억한다. NSString 의 길이는 16비트 코드 유닛이므로 문자열 내의 유니코드 확장 그래핌 클러스터가 아니다.

문자열에 접근하고 수정하기

문자열에 접근하고 수정하는 것은 이 메소드와 프로퍼티 또는 서브스크립트 문법을 통한다.

스트링 인덱스

각 스트링 값은 연관된 인덱스 형식을 갖는다. String.Index. 이는 문자열의 각 캐릭터의 위치에 연관된다.

위에서 언급한대로, 다른 캐릭터들은 다른 양의 메모리를 사용한다. 어떤 캐릭터가 실제적인 위치에 있는지 확인하기위해 문자열의 시작과 끝사이에서 반복을 해야한다. 이런 이유로, 스위프트문자열은 정수형 값으로 인덱스될 수 없다.

문자열의 첫 캐릭터의 위치에 접근하는데는 startIndex 속성을 사용한다. endIndex 속성은 문자열내의 마지막 키릭터 이후의 위치이다. 결과적으로 endIndex는 스트링 서브스크립트의 유효한 위치가 아니다. 만약 스트링이 비었다면, startIndex 와 endIndex가 같을 것이다.

index(before:)와 index(after:) 메소드를 사용할여 주어진 인덱스의 이후 이전 인덱스에 접근할 수 있다. 주어진 인덱스보다 더 먼 것은 이들을 여러번호출하는 것보다 index(_:offsetBy:) 메소드를 사용한다.

특정 문자열 인덱스에 캐릭터를 접근하는데 서브스크립트 문법을 사용할 수 있다.

let greeting = "Guten Tag!"
greeting[greeting.startIndex]
// G
greeting[greeting.index(before:greeting.endIndex)]
// !
greeting[greeting.index(after.greeting.startIndex])
// u
let index = greeting.index(greeting.startIndex, offsetBy:7)
greeting[index]
// a

문자열의 범위의 바깥에 접근하면 런타임 에러를 발생시킨다.

greeting[greeting.endIndex] // Error
greeting.index(after:greeting.endIndex) // Error

주어진 문자열의 각 문자의 모든 인덱스에 접근하려면 indices 속성을 사용한다.

for index in greeting.indices {
  print("\(greeting[index]) ", terminator: "")
}
// Prints "G u t e n  T a g ! "

일러두기)
Collection 프로토콜에 따르는 모든 형식에서 startIndex, endIndex, index(before:), index(after:), 그리고 index(_:offsetBy:) 메소드를 사용할 수 있다.
이 것은 String과 함께 Array, Dictionary, Set에도 있다.

삽입과 삭제

문자열 내에 특정 인덱스에 단일 문자를 넣으려면 insert(_:at:) 메소드를 사용하고 특정 인덱스에 다른 문자의 컨텐트를 넣으려면 insert(contentsOf:at:) 메소드를 사용한다

var welcome = "hello"
welcome.insert("!", at:welcome.endIndex)
// welcome now equals "hello!"

welcome.insert(contentsOf:" there", at:welcome.index(before:welcome.endIndex))
// welcome now equals "hello there!"

문자열에서 특정 인덱스의 단일 문자를 제거하려면 remove(at:) 메소드를 사용하고 지정된 범위의 서브스트링을 제거하려면 removeSubrange(_:)메소드를 사용한다.

welcome.remove(at:welcome.index(before:welcome.endIndex))
// welcome now equals "hello there"

let range = welcome.index(welcome.endIndex, offsetBy:-6)..<welcome.endIndex
welcome.removeSubrange(range)
// welcome now equals "hello"

일러두기)
insert(_:at:), insert(contentsOf:at:), remove(at:) 그리고 removeSubrange(_:) 을 RangeReplaceableCollection 프로토콜에 호환하는 모든 형식에서 사용할 수 있다. 이 들은 Array, Dictionary, Set에도 있다.


부분 문자열

문자열로부터 부분문자열을 얻을 때 서브스크립트나 prefix(_:)와 같은 메소드를 사용하며 결과는 string이 아닌 Substring 인스턴스이다. 스위프트에서 서브스트링은 스트링과 대부분 같은 메소드들을 갖지만 스트링과 작업하는것과 같은 방식으로 처리한다. 그러나, 스트링과는 달리, 스트링에서 액션을 수행하는것보다 짧은 시간을 사용하게 된다. 더 긴 시간을위해 결과를 저장하기 준비되면 서브스트링을 스트링으로 컨버트한다.

let greeting = "Hello, world!"
let index = greeting.firstIndex(of:",") ?? greeting.endIndex
let beginning = greeting[..<index]
// beginning is "Hello"

// Convert the result to a String for long-term storage.
let newString = String(beginning)

스트링 처럼 각 서브스트링은 메모리 영역을 가진다. 스트링과 서브스트링의 차이는 성능향상에 있다. 서브스트링은 메모리의 재사용부분으로 원본 스트링이 저장했던곳에서 사용한 곳이거나 다른 서브스트링이 저장하는데 필요한 메모리의 한 부분이다. (스트링은 비슷한 옵티마이제이션을 가지지만, 두 문자열이 메모리를 공유하면 이들은 같은 것이다.) 이 성능 옵티마이제이션이 의미하는 것은 스트링 또는 서브스트링을 수정하기전까지 메모리 복사에 의한 비용이 에 걱정하지 않아도 된다는 점이다. 위에 언급한대로, 서브스트링은 롱텀 스토리지에는 적합하지 않은데, 그이유는 원본 스트링의 공간을 재사용하기 때문이며 전체 원본 스트링은 반드시 서브스트링이 사용되는동안에 유지되어야만 하기 때문이다.

In the example above, greeting is a string, which means it has a region of memory where the characters that make up the string are stored. Because beginning is a substring of greeting, it reuses the memory that greeting uses. In contrast, newString is a string—when it’s created from the substring, it has its own storage. The figure below shows these relationships:

../_images/stringSubstring_2x.png

NOTE

Both String and Substring conform to the StringProtocol protocol, which means it’s often convenient for string-manipulation functions to accept a StringProtocol value. You can call such functions with either a String or Substring value.

문자열 비교하기

스위프트는 텍스트 값을 비교하는 세 가지 방법을 제공한다. : 문자열과 문자 등가성, 프리픽스 등가성, 그리고 서픽스 등가성

스트링과 문자 등가성

스트링과 문자 등가성은 == 연산자나 != 연산자로 확인한다.

let quotation = "We're a lot alike, you and I."
let sameQuotation = "We're a lot alike, you and I."
if quotation == sameQuotation {
    print("These two strings are considered equal")
}
// Prints "These two strings are considered equal"

Two String values (or two Character values) are considered equal if their extended grapheme clusters are canonically equivalent. Extended grapheme clusters are canonically equivalent if they have the same linguistic meaning and appearance, even if they’re composed from different Unicode scalars behind the scenes.

For example, LATIN SMALL LETTER E WITH ACUTE (U+00E9) is canonically equivalent to LATIN SMALL LETTER E (U+0065) followed by COMBINING ACUTE ACCENT (U+0301). Both of these extended grapheme clusters are valid ways to represent the character é, and so they’re considered to be canonically equivalent:

// "Voulez-vous un café?" using LATIN SMALL LETTER E WITH ACUTE
let eAcuteQuestion = "Voulez-vous un caf\u{E9}?"

// "Voulez-vous un café?" using LATIN SMALL LETTER E and COMBINING ACUTE ACCENT
let combinedEAcuteQuestion = "Voulez-vous un caf\u{65}\u{301}?"

if eAcuteQuestion == combinedEAcuteQuestion {
    print("These two strings are considered equal")
}
// Prints "These two strings are considered equal"

Conversely, LATIN CAPITAL LETTER A (U+0041, or "A"), as used in English, is not equivalent to CYRILLIC CAPITAL LETTER A (U+0410, or "А"), as used in Russian. The characters are visually similar, but don’t have the same linguistic meaning:

let latinCapitalLetterA: Character = "\u{41}"

let cyrillicCapitalLetterA: Character = "\u{0410}"

if latinCapitalLetterA != cyrillicCapitalLetterA {
    print("These two characters are not equivalent.")
}
// Prints "These two characters are not equivalent."
NOTE

String and character comparisons in Swift are not locale-sensitive.

프리픽스와 서픽스 등가성

To check whether a string has a particular string prefix or suffix, call the string’s hasPrefix(_:) and hasSuffix(_:) methods, both of which take a single argument of type String and return a Boolean value.

The examples below consider an array of strings representing the scene locations from the first two acts of Shakespeare’s Romeo and Juliet:

let romeoAndJuliet = [
    "Act 1 Scene 1: Verona, A public place",
    "Act 1 Scene 2: Capulet's mansion",
    "Act 1 Scene 3: A room in Capulet's mansion",
    "Act 1 Scene 4: A street outside Capulet's mansion",
    "Act 1 Scene 5: The Great Hall in Capulet's mansion",
    "Act 2 Scene 1: Outside Capulet's mansion",
    "Act 2 Scene 2: Capulet's orchard",
    "Act 2 Scene 3: Outside Friar Lawrence's cell",
    "Act 2 Scene 4: A street in Verona",
    "Act 2 Scene 5: Capulet's mansion",
    "Act 2 Scene 6: Friar Lawrence's cell"
]

You can use the hasPrefix(_:) method with the romeoAndJuliet array to count the number of scenes in Act 1 of the play:

var act1SceneCount = 0
for scene in romeoAndJuliet {
    if scene.hasPrefix("Act 1 ") {
        act1SceneCount += 1
    }
}
print("There are \(act1SceneCount) scenes in Act 1")
// Prints "There are 5 scenes in Act 1"

Similarly, use the hasSuffix(_:) method to count the number of scenes that take place in or around Capulet’s mansion and Friar Lawrence’s cell:

var mansionCount = 0
var cellCount = 0
for scene in romeoAndJuliet {
    if scene.hasSuffix("Capulet's mansion") {
        mansionCount += 1
    } else if scene.hasSuffix("Friar Lawrence's cell") {
        cellCount += 1
    }
}
print("\(mansionCount) mansion scenes; \(cellCount) cell scenes")
// Prints "6 mansion scenes; 2 cell scenes"

NOTE

The hasPrefix(_:) and hasSuffix(_:) methods perform a character-by-character canonical equivalence comparison between the extended grapheme clusters in each string, as described in String and Character Equality.

문자열의 유니코드 표현

When a Unicode string is written to a text file or some other storage, the Unicode scalars in that string are encoded in one of several Unicode-defined encoding forms. Each form encodes the string in small chunks known as code units. These include the UTF-8 encoding form (which encodes a string as 8-bit code units), the UTF-16 encoding form (which encodes a string as 16-bit code units), and the UTF-32 encoding form (which encodes a string as 32-bit code units).

Swift provides several different ways to access Unicode representations of strings. You can iterate over the string with a for-in statement, to access its individual Character values as Unicode extended grapheme clusters. This process is described in Working with Characters.

Alternatively, access a String value in one of three other Unicode-compliant representations:

A collection of UTF-8 code units (accessed with the string’s utf8 property)
A collection of UTF-16 code units (accessed with the string’s utf16 property)
A collection of 21-bit Unicode scalar values, equivalent to the string’s UTF-32 encoding form (accessed with the string’s unicodeScalars property)
Each example below shows a different representation of the following string, which is made up of the characters D, o, g, ‼ (DOUBLE EXCLAMATION MARK, or Unicode scalar U+203C), and the 🐶 character (DOG FACE, or Unicode scalar U+1F436):

let dogString = "Dog‼🐶"

UTF-8 표현

You can access a UTF-8 representation of a String by iterating over its utf8 property. This property is of type String.UTF8View, which is a collection of unsigned 8-bit (UInt8) values, one for each byte in the string’s UTF-8 representation:

../_images/UTF8_2x.png

for codeUnit in dogString.utf8 {
    print("\(codeUnit) ", terminator: "")
}
print("")
// Prints "68 111 103 226 128 188 240 159 144 182 "

In the example above, the first three decimal codeUnit values (68, 111, 103) represent the characters D, o, and g, whose UTF-8 representation is the same as their ASCII representation. The next three decimal codeUnit values (226, 128, 188) are a three-byte UTF-8 representation of the DOUBLE EXCLAMATION MARK character. The last four codeUnit values (240, 159, 144, 182) are a four-byte UTF-8 representation of the DOG FACE character.

UTF-16 표현

You can access a UTF-16 representation of a String by iterating over its utf16 property. This property is of type String.UTF16View, which is a collection of unsigned 16-bit (UInt16) values, one for each 16-bit code unit in the string’s UTF-16 representation:

../_images/UTF16_2x.png

for codeUnit in dogString.utf16 {
    print("\(codeUnit) ", terminator: "")
}
print("")
// Prints "68 111 103 8252 55357 56374 "

Again, the first three codeUnit values (68, 111, 103) represent the characters D, o, and g, whose UTF-16 code units have the same values as in the string’s UTF-8 representation (because these Unicode scalars represent ASCII characters).

The fourth codeUnit value (8252) is a decimal equivalent of the hexadecimal value 203C, which represents the Unicode scalar U+203C for the DOUBLE EXCLAMATION MARK character. This character can be represented as a single code unit in UTF-16.

The fifth and sixth codeUnit values (55357 and 56374) are a UTF-16 surrogate pair representation of the DOG FACE character. These values are a high-surrogate value of U+D83D (decimal value 55357) and a low-surrogate value of U+DC36 (decimal value 56374).

유니코드 스칼라 표현

You can access a Unicode scalar representation of a String value by iterating over its unicodeScalars property. This property is of type UnicodeScalarView, which is a collection of values of type UnicodeScalar.

Each UnicodeScalar has a value property that returns the scalar’s 21-bit value, represented within a UInt32 value:

../_images/UnicodeScalar_2x.png

for scalar in dogString.unicodeScalars {
    print("\(scalar.value) ", terminator: "")
}
print("")
// Prints "68 111 103 8252 128054 "

The value properties for the first three UnicodeScalar values (68, 111, 103) once again represent the characters D, o, and g.

The fourth codeUnit value (8252) is again a decimal equivalent of the hexadecimal value 203C, which represents the Unicode scalar U+203C for the DOUBLE EXCLAMATION MARK character.

The value property of the fifth and final UnicodeScalar, 128054, is a decimal equivalent of the hexadecimal value 1F436, which represents the Unicode scalar U+1F436 for the DOG FACE character.

As an alternative to querying their value properties, each UnicodeScalar value can also be used to construct a new String value, such as with string interpolation:

for scalar in dogString.unicodeScalars {
    print("\(scalar) ")
}
// D
// o
// g
// ‼
// 🐶



덧글

댓글 입력 영역