Strings
Strings
Base.length
Method
length(s::AbstractString)
The number of characters in string s
.
julia> length("jμΛIα") 5source
Base.sizeof
Method
sizeof(s::AbstractString)
The number of bytes in string s
.
julia> sizeof("❤") 3source
Base.:*
Method
*(x, y...)
Multiplication operator. x*y*z*...
calls this function with all arguments, i.e. *(x, y, z, ...)
.
Base.:^
Method
^(s::AbstractString, n::Integer)
Repeat n
times the string s
. The repeat
function is an alias to this operator.
julia> "Test "^3 "Test Test Test "source
Base.string
Function
string(xs...)
Create a string from any values using the print
function.
julia> string("a", 1, true) "a1true"source
Base.repr
Function
repr(x)
Create a string from any value using the showall
function.
Core.String
Method
String(s::AbstractString)
Convert a string to a contiguous byte array representation encoded as UTF-8 bytes. This representation is often appropriate for passing strings to C.
source
Base.transcode
Function
transcode(T, src)
Convert string data between Unicode encodings. src
is either a String
or a Vector{UIntXX}
of UTF-XX code units, where XX
is 8, 16, or 32. T
indicates the encoding of the return value: String
to return a (UTF-8 encoded) String
or UIntXX
to return a Vector{UIntXX}
of UTF-XX
data. (The alias Cwchar_t
can also be used as the integer type, for converting wchar_t*
strings used by external C libraries.)
The transcode
function succeeds as long as the input data can be reasonably represented in the target encoding; it always succeeds for conversions between UTF-XX encodings, even for invalid Unicode data.
Only conversion to/from UTF-8 is currently supported.
source
Base.unsafe_string
Function
unsafe_string(p::Ptr{UInt8}, [length::Integer])
Copy a string from the address of a C-style (NUL-terminated) string encoded as UTF-8. (The pointer can be safely freed afterwards.) If length
is specified (the length of the data in bytes), the string does not have to be NUL-terminated.
This function is labelled "unsafe" because it will crash if p
is not a valid memory address to data of the requested length.
Base.codeunit
Method
codeunit(s::AbstractString, i::Integer)
Get the i
th code unit of an encoded string. For example, returns the i
th byte of the representation of a UTF-8 string.
Base.ascii
Function
ascii(s::AbstractString)
Convert a string to String
type and check that it contains only ASCII data, otherwise throwing an ArgumentError
indicating the position of the first non-ASCII byte.
julia> ascii("abcdeγfgh") ERROR: ArgumentError: invalid ASCII at index 6 in "abcdeγfgh" Stacktrace: [1] ascii(::String) at ./strings/util.jl:479 julia> ascii("abcdefgh") "abcdefgh"source
Base.@r_str
Macro
@r_str -> Regex
Construct a regex, such as r"^[a-z]*$"
. The regex also accepts one or more flags, listed after the ending quote, to change its behaviour:
i
enables case-insensitive matchingm
treats the^
and$
tokens as matching the start and end of individual lines, as opposed to the whole string.s
allows the.
modifier to match newlines.x
enables "comment mode": whitespace is enabled except when escaped with\
, and#
is treated as starting a comment.
For example, this regex has all three flags enabled:
julia> match(r"a+.*b+.*?d$"ism, "Goodbye,\nOh, angry,\nBad world\n") RegexMatch("angry,\nBad world")source
Base.Docs.@html_str
Macro
@html_str -> Docs.HTML
Create an HTML
object from a literal string.
Base.Docs.@text_str
Macro
@text_str -> Docs.Text
Create a Text
object from a literal string.
Base.UTF8proc.normalize_string
Function
normalize_string(s::AbstractString, normalform::Symbol)
Normalize the string s
according to one of the four "normal forms" of the Unicode standard: normalform
can be :NFC
, :NFD
, :NFKC
, or :NFKD
. Normal forms C (canonical composition) and D (canonical decomposition) convert different visually identical representations of the same abstract string into a single canonical form, with form C being more compact. Normal forms KC and KD additionally canonicalize "compatibility equivalents": they convert characters that are abstractly similar but visually distinct into a single canonical choice (e.g. they expand ligatures into the individual characters), with form KC being more compact.
Alternatively, finer control and additional transformations may be be obtained by calling normalize_string(s; keywords...)
, where any number of the following boolean keywords options (which all default to false
except for compose
) are specified:
compose=false
: do not perform canonical compositiondecompose=true
: do canonical decomposition instead of canonical composition (compose=true
is ignored if present)compat=true
: compatibility equivalents are canonicalizedcasefold=true
: perform Unicode case folding, e.g. for case-insensitive string comparisonnewline2lf=true
,newline2ls=true
, ornewline2ps=true
: convert various newline sequences (LF, CRLF, CR, NEL) into a linefeed (LF), line-separation (LS), or paragraph-separation (PS) character, respectivelystripmark=true
: strip diacritical marks (e.g. accents)stripignore=true
: strip Unicode's "default ignorable" characters (e.g. the soft hyphen or the left-to-right marker)stripcc=true
: strip control characters; horizontal tabs and form feeds are converted to spaces; newlines are also converted to spaces unless a newline-conversion flag was specifiedrejectna=true
: throw an error if unassigned code points are foundstable=true
: enforce Unicode Versioning Stability
For example, NFKC corresponds to the options compose=true, compat=true, stable=true
.
Base.UTF8proc.graphemes
Function
graphemes(s::AbstractString) -> GraphemeIterator
Returns an iterator over substrings of s
that correspond to the extended graphemes in the string, as defined by Unicode UAX #29. (Roughly, these are what users would perceive as single characters, even though they may contain more than one codepoint; for example a letter combined with an accent mark is a single grapheme.)
Base.isvalid
Method
isvalid(value) -> Bool
Returns true
if the given value is valid for its type, which currently can be either Char
or String
.
Base.isvalid
Method
isvalid(T, value) -> Bool
Returns true
if the given value is valid for that type. Types currently can be either Char
or String
. Values for Char
can be of type Char
or UInt32
. Values for String
can be of that type, or Vector{UInt8}
.
Base.isvalid
Method
isvalid(str::AbstractString, i::Integer)
Tells whether index i
is valid for the given string.
julia> str = "αβγdef"; julia> isvalid(str, 1) true julia> str[1] 'α': Unicode U+03b1 (category Ll: Letter, lowercase) julia> isvalid(str, 2) false julia> str[2] ERROR: UnicodeError: invalid character index [...]source
Base.UTF8proc.is_assigned_char
Function
is_assigned_char(c) -> Bool
Returns true
if the given char or integer is an assigned Unicode code point.
Base.ismatch
Function
ismatch(r::Regex, s::AbstractString) -> Bool
Test whether a string contains a match of the given regular expression.
source
Base.match
Function
match(r::Regex, s::AbstractString[, idx::Integer[, addopts]])
Search for the first match of the regular expression r
in s
and return a RegexMatch
object containing the match, or nothing if the match failed. The matching substring can be retrieved by accessing m.match
and the captured sequences can be retrieved by accessing m.captures
The optional idx
argument specifies an index at which to start the search.
Base.eachmatch
Function
eachmatch(r::Regex, s::AbstractString[, overlap::Bool=false])
Search for all matches of a the regular expression r
in s
and return a iterator over the matches. If overlap is true
, the matching sequences are allowed to overlap indices in the original string, otherwise they must be from distinct character ranges.
Base.matchall
Function
matchall(r::Regex, s::AbstractString[, overlap::Bool=false]) -> Vector{AbstractString}
Return a vector of the matching substrings from eachmatch
.
Base.lpad
Function
lpad(s, n::Integer, p::AbstractString=" ")
Make a string at least n
columns wide when printed by padding s
on the left with copies of p
.
julia> lpad("March",10) " March"source
Base.rpad
Function
rpad(s, n::Integer, p::AbstractString=" ")
Make a string at least n
columns wide when printed by padding s
on the right with copies of p
.
julia> rpad("March",20) "March "source
Base.search
Function
search(string::AbstractString, chars::Chars, [start::Integer])
Search for the first occurrence of the given characters within the given string. The second argument may be a single character, a vector or a set of characters, a string, or a regular expression (though regular expressions are only allowed on contiguous strings, such as ASCII or UTF-8 strings). The third argument optionally specifies a starting index. The return value is a range of indexes where the matching sequence is found, such that s[search(s,x)] == x
:
search(string, "substring")
= start:end
such that string[start:end] == "substring"
, or 0:-1
if unmatched.
search(string, 'c')
= index
such that string[index] == 'c'
, or 0
if unmatched.
julia> search("Hello to the world", "z") 0:-1 julia> search("JuliaLang","Julia") 1:5source
Base.rsearch
Function
rsearch(s::AbstractString, chars::Chars, [start::Integer])
Similar to search
, but returning the last occurrence of the given characters within the given string, searching in reverse from start
.
julia> rsearch("aaabbb","b") 6:6source
Base.searchindex
Function
searchindex(s::AbstractString, substring, [start::Integer])
Similar to search
, but return only the start index at which the substring is found, or 0
if it is not.
julia> searchindex("Hello to the world", "z") 0 julia> searchindex("JuliaLang","Julia") 1 julia> searchindex("JuliaLang","Lang") 6source
Base.rsearchindex
Function
rsearchindex(s::AbstractString, substring, [start::Integer])
Similar to rsearch
, but return only the start index at which the substring is found, or 0
if it is not.
julia> rsearchindex("aaabbb","b") 6 julia> rsearchindex("aaabbb","a") 3source
Base.contains
Method
contains(haystack::AbstractString, needle::AbstractString)
Determine whether the second argument is a substring of the first.
julia> contains("JuliaLang is pretty cool!", "Julia") truesource
Base.reverse
Method
reverse(s::AbstractString) -> AbstractString
Reverses a string.
julia> reverse("JuliaLang") "gnaLailuJ"source
Base.replace
Function
replace(string::AbstractString, pat, r[, n::Integer=0])
Search for the given pattern pat
, and replace each occurrence with r
. If n
is provided, replace at most n
occurrences. As with search, the second argument may be a single character, a vector or a set of characters, a string, or a regular expression. If r
is a function, each occurrence is replaced with r(s)
where s
is the matched substring. If pat
is a regular expression and r
is a SubstitutionString
, then capture group references in r
are replaced with the corresponding matched text.
Base.split
Function
split(s::AbstractString, [chars]; limit::Integer=0, keep::Bool=true)
Return an array of substrings by splitting the given string on occurrences of the given character delimiters, which may be specified in any of the formats allowed by search
's second argument (i.e. a single character, collection of characters, string, or regular expression). If chars
is omitted, it defaults to the set of all space characters, and keep
is taken to be false
. The two keyword arguments are optional: they are a maximum size for the result and a flag determining whether empty fields should be kept in the result.
julia> a = "Ma.rch" "Ma.rch" julia> split(a,".") 2-element Array{SubString{String},1}: "Ma" "rch"source
Base.rsplit
Function
rsplit(s::AbstractString, [chars]; limit::Integer=0, keep::Bool=true)
Similar to split
, but starting from the end of the string.
julia> a = "M.a.r.c.h" "M.a.r.c.h" julia> rsplit(a,".") 5-element Array{SubString{String},1}: "M" "a" "r" "c" "h" julia> rsplit(a,".";limit=1) 1-element Array{SubString{String},1}: "M.a.r.c.h" julia> rsplit(a,".";limit=2) 2-element Array{SubString{String},1}: "M.a.r.c" "h"source
Base.strip
Function
strip(s::AbstractString, [chars::Chars])
Return s
with any leading and trailing whitespace removed. If chars
(a character, or vector or set of characters) is provided, instead remove characters contained in it.
julia> strip("{3, 5}\n", ['{', '}', '\n']) "3, 5"source
Base.lstrip
Function
lstrip(s::AbstractString[, chars::Chars])
Return s
with any leading whitespace and delimiters removed. The default delimiters to remove are ' '
, \t
, \n
, \v
, \f
, and \r
. If chars
(a character, or vector or set of characters) is provided, instead remove characters contained in it.
julia> a = lpad("March", 20) " March" julia> lstrip(a) "March"source
Base.rstrip
Function
rstrip(s::AbstractString[, chars::Chars])
Return s
with any trailing whitespace and delimiters removed. The default delimiters to remove are ' '
, \t
, \n
, \v
, \f
, and \r
. If chars
(a character, or vector or set of characters) is provided, instead remove characters contained in it.
julia> a = rpad("March", 20) "March " julia> rstrip(a) "March"source
Base.startswith
Function
startswith(s::AbstractString, prefix::AbstractString)
Returns true
if s
starts with prefix
. If prefix
is a vector or set of characters, tests whether the first character of s
belongs to that set.
See also endswith
.
julia> startswith("JuliaLang", "Julia") truesource
Base.endswith
Function
endswith(s::AbstractString, suffix::AbstractString)
Returns true
if s
ends with suffix
. If suffix
is a vector or set of characters, tests whether the last character of s
belongs to that set.
See also startswith
.
julia> endswith("Sunday", "day") truesource
Base.uppercase
Function
uppercase(s::AbstractString)
Returns s
with all characters converted to uppercase.
julia> uppercase("Julia") "JULIA"source
Base.lowercase
Function
lowercase(s::AbstractString)
Returns s
with all characters converted to lowercase.
julia> lowercase("STRINGS AND THINGS") "strings and things"source
Base.titlecase
Function
titlecase(s::AbstractString)
Capitalizes the first character of each word in s
.
julia> titlecase("the julia programming language") "The Julia Programming Language"source
Base.ucfirst
Function
ucfirst(s::AbstractString)
Returns string
with the first character converted to uppercase.
julia> ucfirst("python") "Python"source
Base.lcfirst
Function
lcfirst(s::AbstractString)
Returns string
with the first character converted to lowercase.
julia> lcfirst("Julia") "julia"source
Base.join
Function
join(io::IO, strings, delim, [last])
Join an array of strings
into a single string, inserting the given delimiter between adjacent strings. If last
is given, it will be used instead of delim
between the last two strings. For example,
julia> join(["apples", "bananas", "pineapples"], ", ", " and ") "apples, bananas and pineapples"
strings
can be any iterable over elements x
which are convertible to strings via print(io::IOBuffer, x)
. strings
will be printed to io
.
Base.chop
Function
chop(s::AbstractString)
Remove the last character from s
.
julia> a = "March" "March" julia> chop(a) "Marc"source
Base.chomp
Function
chomp(s::AbstractString)
Remove a single trailing newline from a string.
julia> chomp("Hello\n") "Hello"source
Base.ind2chr
Function
ind2chr(s::AbstractString, i::Integer)
Convert a byte index i
to a character index with respect to string s
.
See also chr2ind
.
julia> str = "αβγdef"; julia> ind2chr(str, 3) 2 julia> chr2ind(str, 2) 3source
Base.chr2ind
Function
chr2ind(s::AbstractString, i::Integer)
Convert a character index i
to a byte index.
See also ind2chr
.
julia> str = "αβγdef"; julia> chr2ind(str, 2) 3 julia> ind2chr(str, 3) 2source
Base.nextind
Function
nextind(str::AbstractString, i::Integer)
Get the next valid string index after i
. Returns a value greater than endof(str)
at or after the end of the string.
julia> str = "αβγdef"; julia> nextind(str, 1) 3 julia> endof(str) 9 julia> nextind(str, 9) 10source
Base.prevind
Function
prevind(str::AbstractString, i::Integer)
Get the previous valid string index before i
. Returns a value less than 1
at the beginning of the string.
julia> prevind("αβγdef", 3) 1 julia> prevind("αβγdef", 1) 0source
Base.Random.randstring
Function
randstring([rng,] len=8)
Create a random ASCII string of length len
, consisting of upper- and lower-case letters and the digits 0-9. The optional rng
argument specifies a random number generator, see Random Numbers.
Base.UTF8proc.charwidth
Function
charwidth(c)
Gives the number of columns needed to print a character.
source
Base.strwidth
Function
strwidth(s::AbstractString)
Gives the number of columns needed to print a string.
julia> strwidth("March") 5source
Base.UTF8proc.isalnum
Function
isalnum(c::Char) -> Bool
Tests whether a character is alphanumeric. A character is classified as alphabetic if it belongs to the Unicode general category Letter or Number, i.e. a character whose category code begins with 'L' or 'N'.
source
Base.UTF8proc.isalpha
Function
isalpha(c::Char) -> Bool
Tests whether a character is alphabetic. A character is classified as alphabetic if it belongs to the Unicode general category Letter, i.e. a character whose category code begins with 'L'.
source
Base.isascii
Function
isascii(c::Union{Char,AbstractString}) -> Bool
Tests whether a character belongs to the ASCII character set, or whether this is true for all elements of a string.
source
Base.UTF8proc.iscntrl
Function
iscntrl(c::Char) -> Bool
Tests whether a character is a control character. Control characters are the non-printing characters of the Latin-1 subset of Unicode.
source
Base.UTF8proc.isdigit
Function
isdigit(c::Char) -> Bool
Tests whether a character is a numeric digit (0-9).
source
Base.UTF8proc.isgraph
Function
isgraph(c::Char) -> Bool
Tests whether a character is printable, and not a space. Any character that would cause a printer to use ink should be classified with isgraph(c)==true
.
Base.UTF8proc.islower
Function
islower(c::Char) -> Bool
Tests whether a character is a lowercase letter. A character is classified as lowercase if it belongs to Unicode category Ll, Letter: Lowercase.
source
Base.UTF8proc.isnumber
Function
isnumber(c::Char) -> Bool
Tests whether a character is numeric. A character is classified as numeric if it belongs to the Unicode general category Number, i.e. a character whose category code begins with 'N'.
source
Base.UTF8proc.isprint
Function
isprint(c::Char) -> Bool
Tests whether a character is printable, including spaces, but not a control character.
source
Base.UTF8proc.ispunct
Function
ispunct(c::Char) -> Bool
Tests whether a character belongs to the Unicode general category Punctuation, i.e. a character whose category code begins with 'P'.
source
Base.UTF8proc.isspace
Function
isspace(c::Char) -> Bool
Tests whether a character is any whitespace character. Includes ASCII characters '\t', '\n', '\v', '\f', '\r', and ' ', Latin-1 character U+0085, and characters in Unicode category Zs.
source
Base.UTF8proc.isupper
Function
isupper(c::Char) -> Bool
Tests whether a character is an uppercase letter. A character is classified as uppercase if it belongs to Unicode category Lu, Letter: Uppercase, or Lt, Letter: Titlecase.
source
Base.isxdigit
Function
isxdigit(c::Char) -> Bool
Tests whether a character is a valid hexadecimal digit. Note that this does not include x
(as in the standard 0x
prefix).
julia> isxdigit('a') true julia> isxdigit('x') falsesource
Core.Symbol
Type
Symbol(x...) -> Symbol
Create a Symbol
by concatenating the string representations of the arguments together.
Base.escape_string
Function
escape_string([io,] str::AbstractString[, esc::AbstractString]) -> AbstractString
General escaping of traditional C and Unicode escape sequences. Any characters in esc
are also escaped (with a backslash). See also unescape_string
.
Base.unescape_string
Function
unescape_string([io,] s::AbstractString) -> AbstractString
General unescaping of traditional C and Unicode escape sequences. Reverse of escape_string
.
© 2009–2016 Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and other contributors
Licensed under the MIT License.
https://docs.julialang.org/en/release-0.6/stdlib/strings/