← Back to blog

All about strings in Python3

| Python

This article was published over 2 years ago. Some information may be outdated.

Strings are the most fundamental data type you will work with in any programming language. No application exists without them.

Python has a rich set of string methods. This post covers the most important ones, but first we need to understand how strings work in Python at a foundational level.

Strings in Python

Python 3 stores strings as a sequence of Unicode code points. This means you can represent any Unicode character -- Arabic, Hebrew, Danish, and yes, even emojis.

Strings are immutable. Once you create a string, you cannot change it.

You can create a string by enclosing it in double or single quotes, or by using the str object. Both approaches produce a str object, because everything in Python is an object:

# Double quotes
hello = "Hello World I'm using Python!"

# Single quotes
hello = 'Hello World I\'m using Python!'

# str object
hello = str("Hello World I'm using Python!")

print(type(hello)) # This should always return "<class 'str'>"

Sometimes you need a raw string -- one that treats escape characters like \n, \r, and \t literally. Use the r prefix:

print(r"Hello World\nI'm using Python!\t\t\t Amazing")

Triple quotes """ handle multi-line strings:

python = """
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.[26] Van Rossum led the language community until stepping down as leader in July 2018
"""

print(python)

Triple quotes also serve as multi-line code comments in Python.

Concatenation uses the plus (+) sign:

hello = "Hello" + " " + "World"

Python treats strings as sequences, meaning each character has its own index, just like a list() or dict(). Here is what that looks like in practice:

hello = "Hello World! I'm using Python!"
h = hello[0]
print(h) # Output: H

for char in hello:
    print(char, " ") #Output: H\ne\nl etc...

The slice operator lets you extract portions of a string:

hello = "Hello World"
print( hello[:5] ) # Output: Hello
print( hello[6:] ) # Output: World
print( hello[-1:-5] ) # Output: World

The multiplication operator repeats a string:

print("Hi" * 2) # Output: HiHi

Since Python 3 works with Unicode code points, you can print emojis just like any other string:

# All these statements produce the same output

print("I love Python ♥️")
print("I love Python \U0001F600") # Emoji using the Unicode code number
print("I love Python \N{grinning face}") #Emojis using the CLDR
  • Visit the full emoji list page from the Unicode website for all emoji codes.
  • For emoji representation, use the CLDR or the Unicode character. Avoid embedding direct emoji glyphs in your code.
  • You might also find the emoji module useful.

Now for non-Latin characters. Since Python 3 uses Unicode by default, you can print any Unicode string directly:

print("أنا سَعيدٌ جِدّاً بلقاءك!") # Arabic
print("Jeg er så glad for at møde dig!") # Danish
print("Jeg er s\u00e5 glad for at m\xf8ode dig!") # Danish with Unicode chars.

Writing non-Latin strings directly in your code is a bad practice. Use something like GNU gettext instead. See Python Multilingual internationalization Services.

How about bytes?

The bytes type is similar to str except it stores a sequence of bytes instead of Unicode code points. It is used for binary data and fixed single-byte character encodings.

You can create bytes with the b prefix or the bytes() constructor:

hello = bytes(source="Hello World", encoding="utf8")
print(hello) #Output: b'Hello World'

The source parameter accepts different data types:

  • String: converts the string to bytes (as shown above).
  • Integer: creates an array of zero values with the provided size.
  • Object.
  • Iterable: creates a byte array from the iterable; each element must be between 0 and 255.

Here is how these types work with bytes:

n = bytes(5)
print(n) # Output: b'\x00\x00\x00\x00\x00'
print(list(n)) # Output: [0, 0, 0, 0, 0]

items = [1, 2, 4, 8, 16, 32]
arr = bytes(items)
print(arr) # Output: b'\x01\x02\x04\x08\x10 '
print(list(arr)) # Output: [1, 2, 4, 8, 16, 32]

Use encode() and decode() to convert between strings and bytes:

my_string = ("Jeg er så glad for at møde dig!")
my_string_encode = my_string.encode()
my_string_decode = my_string_encode.decode('utf8')

print(my_string_encode) # Output: b'Jeg er s\xc3\xa5 glad for at m\xc3\xb8de dig!'
print(my_string_decode) # Output: Jeg er så glad for at møde dig!

String operators

We have already seen concatenation (+), repetition (*), slice ([]), and range slice ([from:to]). Here is the full list:

Operator Description
+ String concatenation
* String repetition
[n] Slice a string by getting a specific char.
[from:to] Range slice, gets a portion of a string
in Returns true if the char. exists in the string
not in Returns true if the char. doesn't exist in the string
% String formatting
string = "Hello World I love Python"
print("Python" in string) # True

string = "Hello World I love Python"
print("Python" not in string) # False

The % formatting operator lets you insert placeholders into a string:

string = "My name is %s and I love to use %s"
print(string % ("Ahmad", "Python")) # Output: My name is Ahmad and I love to use Python

Here, %s is a string placeholder. Python provides other placeholders: %c for character, %d for decimal integer, %f for floating point, and more.

The format() method is more readable and should be preferred over % formatting. We cover it in the string methods section below.

String methods

Python has many string methods. Some are self-explanatory, others deserve a closer look.

  • Capitalize the first letter:
print( "hello".capitalize() ) # Output: Hello
  • Title-case all words:
print( "hello world".title() ) # Output: Hello World
  • Convert case or swap case:
print( "hello world".upper() ) # Output: HELLO WORLD
print( "HELLO WORLD".lower() ) # Output: hello world
print( "Hello WORLD".swapcase() ) # Output: hELLO world
  • Get the length of a string:
print (len("Hello World")) # Output: 11

len is not a string method. It works on lists, dictionaries, tuples, and other types as well.

  • Center a string with padding:
# Output: ----Hello World-----
print ( "Hello World".center(20, "-") )

The dash is repeated ten times total -- five on each side.

  • Count occurrences of a substring:
string = """
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.[26] Van Rossum led the language community until stepping down as leader in July 2018
"""

print( string.count("i") ) # Output: 23

#Starting from 10->20 char
print( string.count("i", 10, 20)) #Output: 1
  • Check if a string starts or ends with a substring:
string = "Python is one of the most popular programming languages"
print( string.startswith("Python") ) # Output: True
print( string.endswith("languages") ) # Output: True

#Start from 0 position to 6th position.
print( string.startswith("Python", 0, 6) ) # Output: True

is methods

Every method starting with is returns a boolean and checks whether the string has a particular property:

Method Description
isalnum() Checks if the string is an alpha-numeric string.
isalpha() Check if the string is an alphabetic string.
isascii() Checks if all characters in the string are ASCII.
isdecimal() Checks if the string is a decimal string.
isdigit() Checks if the string is a decimal string.
isidentifier() Checks if the string is a valid Python identifier.
islower() Checks if the string is a valid Python identifier.
isnumeric() Checks if the string is a valid Python identifier.
isprintable() Checks if if the string is printable.
isspace() Checks if the string is printable.
istitle() Checks if the string is a title-cased string.
isupper() Checks if the string is an uppercase string.

Some are obvious. Others need examples:

#isalnum
print( "Copenhagen 2000".isalnum() ) # Output: False because of the space
print( "2000i".isalnum() ) # True

#isalpha()
print( "Iraq".isalpha() ) # Output: True
print( "ar_IQ".isalpha() ) # Output: False because of the underscore charachter

#isascii
print( "Hello World \xb6".isascii() ) # Output: false because \xb6 is a hex. character not an ascii
print( "Hello World".isascii() ) # Output: True

#islower
print( "hello world".islower() ) # Output: True

#isupper
print( "HELLO WORLD".isupper() ) # Output: True

#isspace
print( " ".isspace() ) # Output: True
print( "".isspace() ) # Output: False
print( "\t \n".isspace() ) # Output: True

#istitle
print( "hello world".istitle() ) # Output: False
print( "Hello World".istitle() ) # Output: True

The methods isdigit, isnumeric, and isdecimal all deal with numbers but differ in their Unicode classification. The distinction matters.

isdecimal checks for standard decimal numbers (0-9 only):

print( "123".isdecimal() ) # Output: True
print( "-123".isdecimal() ) # Output: False
print( "¼".isdecimal() ) # Output: False
print( "١٢٣".isdecimal() ) # Output: False ١٢٣ is the 123 in Hindu numeric system

isnumeric checks for numbers in any Unicode numeric system:

#isnumeric
print( "123".isnumeric() ) # Output: True (Arabic numerals)
print( "١٢٣".isnumeric() ) # Output: True (Hindu numerals)
print( "۴۵۶".isnumeric() ) # Output: True (Farsi numerals)
print( "¼½".isnumeric() ) # Output: True
print( "四五六".isnumeric() ) # Output: True (Chinese numerals)

If you are building a multilingual application that handles different numeric systems, isnumeric is what you want.

More about numeric value in Unicode.

isdigit checks for decimal digits including typographic variants:

print( "①".isdigit() ) # Output: True
print( "⒈".isdigit() ) # Output: True
print( "¹".isdigit() ) # Output: True

More about numerals in Unicode.

isidentifier checks whether a string is a valid Python identifier. This is useful when you need to validate a variable name programmatically:

print( "hello".isidentifier() ) # Output: True
print( "123hello".isidentifier() ) # Output: False
print( "\t".isidentifier() ) # Output: False
print( "hello123".isidentifier() ) # Output: True

isprintable checks if every character in the string is printable. Printable characters include:

  • Letters from A-Z (Uppercase)
  • Letters from a-z (Lowercase)
  • Digits from 0-9
  • Punctuation characters ( !"#$%&'()*+, -./:;?@[]^_`{ | }~ )
  • Space.
print( "Hello World".isprintable() ) # Output: True
print( "أنا أتحدّث العربيّة".isprintable() ) # Output: True
print( "123".isprintable() ) #Output: True
print("Hello\nWorld".isprintable() ) # Output: False
print("Hello\r\tWorld".isprintable() ) # Output: False

Join a sequence of elements by a separator

join concatenates all items in a list, tuple, or set using a separator:

# All these statements produce A,B,C string
print( ",".join(["A", "B", "C"]) ) # Using lists
print( ",".join(("A", "B", "C")) ) # Using tuples
print( ",".join({"A", "B", "C"}) ) # Using dictionaries

Create a list from a given string using a separator

split breaks a string into a list using a separator:

# Output: ['Baghdad', ' Basra', ' Anbar', ' Erbil']
print( "Baghdad, Basra, Anbar, Erbil".split(",") )

By default, split splits on every occurrence. Pass a second argument to limit the number of splits:

# ['Baghdad', ' Basra', ' Anbar, Erbil']
print( "Baghdad, Basra, Anbar, Erbil".split(",", 2) )

Use rsplit() to split from the right side.

splitlines converts line breaks into a list:

# Output: ['Hello World', 'I love Python!']
print("Hello World\nI love Python!".splitlines())

Formatting

String formatting makes your code more readable. Compare this:

name = input("Enter your name? ")
age = input("How old are you? ")
print("Hello " + name + " , you are " + age + " years old")

With the format() method:

print("Hello {name}, you are {age} years old.".format(name=name, age=age))

You can omit the named placeholders:

print("Hello {}, you are {} years old.".format(name,age))

You can specify a conversion type with a colon:

# Format number 123456 to hex.
print("{name:x}".format(name=123456)) # Output: 1e240

Available conversion types:

Conversion Meaning
d Signed integer decimal.
i Signed integer decimal.
o Unsigned octal.
u Unsigned decimal.
x Unsigned hexadecimal (lowercase).
X Unsigned hexadecimal (uppercase).
e Floating point exponential format (lowercase).
E Floating point exponential format (uppercase).
f Floating point decimal format.
F Floating point decimal format.
g Same as "e" if exponent is greater than -4 or less than precision, "f" otherwise.
G Same as "E" if exponent is greater than -4 or less than precision, "F" otherwise.
c Single character (accepts integer or single character string).
r String (converts any python object using repr()).
s String (converts any python object using str()).
% No argument is converted, results in a "%" character in the result.

The r and s conversions deserve special attention.

Python has dunder methods (double underscore methods) like __str__ and __repr__ that extend classes with custom behavior.

By default, printing a class returns its identifier:

class Person(object): pass
print( Person ) # Ouput: <class '__main__.MyClass'>

Using __str__ or __repr__, you control what gets returned:

class Person(object):
    def __init__(self, name, age):
      self.name, self.age = name, age

    def __str__(self):
        return "Your name is {name}, and you are {age} years old".format(name=self.name, age=self.age)

print( Person("Ahmad", "32") ) # Output: Your name is Ahmad, and you are 32 years old

With r and s conversions, you can format any object that implements __repr__ or __str__:

class Ahmad():
    def __str__(self):
        return "Ahmad";

print( "My name is {name}".format(name=Ahmad()) )
  • If you reference the class name without parentheses, Python returns the class name instead of calling __str__.
  • __repr__ adds single quotes to the output and is more precise for number representation. __str__ does not.
  • __str__ is used far more often. Prefer it whenever you need a string representation of an object.

Tab expansion

The expandtabs method controls the width of tab characters (\t):

print( "Hello\tWorld".expandtabs(16) ) # Output: Hello   World

Partitioning strings

partition splits a string into three parts: the portion before the separator, the separator itself, and the portion after it.

# Separate Hello World by a space
greeting = "Hello World".partition(" ")

# This outputs a tuple containing three parts
print(greeting)

# Since it's a tuple we can use the unpacking feature
hello, _, world = greeting

print(hello) # Output: Hello
print(world) # Output: World

Finding substring

find and index locate the first occurrence of a substring and return its position as an integer:

python = """
Python is powerful... and fast;
plays well with others;
runs everywhere;
is friendly & easy to learn;
is Open.
"""

print( python.find("Python") ) # Output: 0
print( python.find("like") ) # Output: -1 (not found)
print( python.find("is", 31) ) # Output: 76 (start from)
print( python.find("is", 79, len(python)) ) # Output: 106 (start and end)

index behaves identically to find except it raises a ValueError when the substring is not found.

Use rfind() or rindex() to search from the right side of the string.

Replacing

replace() performs string replacement:

print( "Hello World".replace("Hello", "Hi") ) # Output: Hi World

The third argument limits the number of replacements:

print( "Hello World, Hello, Hello".replace("Hello", "Hi", 1) ) # Output: Hi World, Hello, Hello

Summary

  • Python 3 strings are Unicode sequences -- they support any script and are always immutable.
  • The bytes type stores raw binary data as a sequence of bytes, not Unicode code points. Use encode() and decode() to convert between the two.
  • String operators include concatenation (+), repetition (*), slicing ([], [from:to]), membership (in, not in), and formatting (%).
  • The is* methods are boolean checks for string properties; isdecimal, isdigit, and isnumeric differ by their Unicode classification scope.
  • format() is the preferred formatting approach over % placeholders, and it integrates with __str__ and __repr__ dunder methods.
  • find vs index -- both locate substrings, but index raises an exception on failure while find returns -1.
  • Full documentation for all string methods is available on the Python documentation page.
Share