All about strings in Python3

Strings are the most fundamental data type you will work with in any programming language. No application exists without them.

Python has a rich set of string methods. This post covers the most important ones, but first we need to understand how strings work in Python at a foundational level.

Strings in Python

Python 3 stores strings as a sequence of Unicode code points. This means you can represent any Unicode character -- Arabic, Hebrew, Danish, and yes, even emojis.

Strings are immutable. Once you create a string, you cannot change it.

You can create a string by enclosing it in double or single quotes, or by using the str object. Both approaches produce a str object, because everything in Python is an object:

# Double quotes
hello = "Hello World I'm using Python!"

# Single quotes
hello = 'Hello World I\'m using Python!'

# str object
hello = str("Hello World I'm using Python!")

print(type(hello)) # This should always return "<class 'str'>"

Sometimes you need a raw string -- one that treats escape characters like \n, \r, and \t literally. Use the r prefix:

print(r"Hello World\nI'm using Python!\t\t\t Amazing")

Triple quotes """ handle multi-line strings:

python = """
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.[26] Van Rossum led the language community until stepping down as leader in July 2018
"""

print(python)

Triple quotes also serve as multi-line code comments in Python.

Concatenation uses the plus (+) sign:

hello = "Hello" + " " + "World"

Python treats strings as sequences, meaning each character has its own index, just like a list() or dict(). Here is what that looks like in practice:

hello = "Hello World! I'm using Python!"
h = hello[0]
print(h) # Output: H

for char in hello:
    print(char, " ") #Output: H\ne\nl etc...

The slice operator lets you extract portions of a string:

hello = "Hello World"
print( hello[:5] ) # Output: Hello
print( hello[6:] ) # Output: World
print( hello[-1:-5] ) # Output: World

The multiplication operator repeats a string:

print("Hi" * 2) # Output: HiHi

Since Python 3 works with Unicode code points, you can print emojis just like any other string:

# All these statements produce the same output

print("I love Python ♥️")
print("I love Python \U0001F600") # Emoji using the Unicode code number
print("I love Python \N{grinning face}") #Emojis using the CLDR

Visit the full emoji list page from the Unicode website for all emoji codes.

For emoji representation, use the CLDR or the Unicode character. Avoid embedding direct emoji glyphs in your code.

You might also find the emoji module useful.

Now for non-Latin characters. Since Python 3 uses Unicode by default, you can print any Unicode string directly:

print("أنا سَعيدٌ جِدّاً بلقاءك!") # Arabic
print("Jeg er så glad for at møde dig!") # Danish
print("Jeg er s\u00e5 glad for at m\xf8ode dig!") # Danish with Unicode chars.

Writing non-Latin strings directly in your code is a bad practice. Use something like GNU gettext instead. See Python Multilingual internationalization Services.

How about bytes?

The bytes type is similar to str except it stores a sequence of bytes instead of Unicode code points. It is used for binary data and fixed single-byte character encodings.

You can create bytes with the b prefix or the bytes() constructor:

hello = bytes(source="Hello World", encoding="utf8")
print(hello) #Output: b'Hello World'

The source parameter accepts different data types:

String: converts the string to bytes (as shown above).
Integer: creates an array of zero values with the provided size.
Object.
Iterable: creates a byte array from the iterable; each element must be between 0 and 255.

Here is how these types work with bytes:

n = bytes(5)
print(n) # Output: b'\x00\x00\x00\x00\x00'
print(list(n)) # Output: [0, 0, 0, 0, 0]

items = [1, 2, 4, 8, 16, 32]
arr = bytes(items)
print(arr) # Output: b'\x01\x02\x04\x08\x10 '
print(list(arr)) # Output: [1, 2, 4, 8, 16, 32]

Use encode() and decode() to convert between strings and bytes:

my_string = ("Jeg er så glad for at møde dig!")
my_string_encode = my_string.encode()
my_string_decode = my_string_encode.decode('utf8')

print(my_string_encode) # Output: b'Jeg er s\xc3\xa5 glad for at m\xc3\xb8de dig!'
print(my_string_decode) # Output: Jeg er så glad for at møde dig!

String operators

We have already seen concatenation (+), repetition (*), slice ([]), and range slice ([from:to]). Here is the full list:

Operator	Description
`+`	String concatenation
`*`	String repetition
`[n]`	Slice a string by getting a specific char.
`[from:to]`	Range slice, gets a portion of a string
`in`	Returns true if the char. exists in the string
`not in`	Returns true if the char. doesn't exist in the string
`%`	String formatting

string = "Hello World I love Python"
print("Python" in string) # True

string = "Hello World I love Python"
print("Python" not in string) # False

The % formatting operator lets you insert placeholders into a string:

string = "My name is %s and I love to use %s"
print(string % ("Ahmad", "Python")) # Output: My name is Ahmad and I love to use Python

Here, %s is a string placeholder. Python provides other placeholders: %c for character, %d for decimal integer, %f for floating point, and more.

The format() method is more readable and should be preferred over % formatting. We cover it in the string methods section below.

String methods

Python has many string methods. Some are self-explanatory, others deserve a closer look.

Capitalize the first letter:

print( "hello".capitalize() ) # Output: Hello

Title-case all words:

print( "hello world".title() ) # Output: Hello World

Convert case or swap case:

print( "hello world".upper() ) # Output: HELLO WORLD
print( "HELLO WORLD".lower() ) # Output: hello world
print( "Hello WORLD".swapcase() ) # Output: hELLO world

Get the length of a string:

print (len("Hello World")) # Output: 11

len is not a string method. It works on lists, dictionaries, tuples, and other types as well.

Center a string with padding:

# Output: ----Hello World-----
print ( "Hello World".center(20, "-") )

The dash is repeated ten times total -- five on each side.

Count occurrences of a substring:

string = """
Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.[26] Van Rossum led the language community until stepping down as leader in July 2018
"""

print( string.count("i") ) # Output: 23

#Starting from 10->20 char
print( string.count("i", 10, 20)) #Output: 1

Check if a string starts or ends with a substring:

string = "Python is one of the most popular programming languages"
print( string.startswith("Python") ) # Output: True
print( string.endswith("languages") ) # Output: True

#Start from 0 position to 6th position.
print( string.startswith("Python", 0, 6) ) # Output: True

is methods

Every method starting with is returns a boolean and checks whether the string has a particular property:

Method	Description
`isalnum()`	Checks if the string is an alpha-numeric string.
`isalpha()`	Check if the string is an alphabetic string.
`isascii()`	Checks if all characters in the string are ASCII.
`isdecimal()`	Checks if the string is a decimal string.
`isdigit()`	Checks if the string is a decimal string.
`isidentifier()`	Checks if the string is a valid Python identifier.
`islower()`	Checks if the string is a valid Python identifier.
`isnumeric()`	Checks if the string is a valid Python identifier.
`isprintable()`	Checks if if the string is printable.
`isspace()`	Checks if the string is printable.
`istitle()`	Checks if the string is a title-cased string.
`isupper()`	Checks if the string is an uppercase string.

Some are obvious. Others need examples:

#isalnum
print( "Copenhagen 2000".isalnum() ) # Output: False because of the space
print( "2000i".isalnum() ) # True

#isalpha()
print( "Iraq".isalpha() ) # Output: True
print( "ar_IQ".isalpha() ) # Output: False because of the underscore charachter

#isascii
print( "Hello World \xb6".isascii() ) # Output: false because \xb6 is a hex. character not an ascii
print( "Hello World".isascii() ) # Output: True

#islower
print( "hello world".islower() ) # Output: True

#isupper
print( "HELLO WORLD".isupper() ) # Output: True

#isspace
print( " ".isspace() ) # Output: True
print( "".isspace() ) # Output: False
print( "\t \n".isspace() ) # Output: True

#istitle
print( "hello world".istitle() ) # Output: False
print( "Hello World".istitle() ) # Output: True

The methods isdigit, isnumeric, and isdecimal all deal with numbers but differ in their Unicode classification. The distinction matters.

isdecimal checks for standard decimal numbers (0-9 only):

print( "123".isdecimal() ) # Output: True
print( "-123".isdecimal() ) # Output: False
print( "¼".isdecimal() ) # Output: False
print( "١٢٣".isdecimal() ) # Output: False ١٢٣ is the 123 in Hindu numeric system

isnumeric checks for numbers in any Unicode numeric system:

#isnumeric
print( "123".isnumeric() ) # Output: True (Arabic numerals)
print( "١٢٣".isnumeric() ) # Output: True (Hindu numerals)
print( "۴۵۶".isnumeric() ) # Output: True (Farsi numerals)
print( "¼½".isnumeric() ) # Output: True
print( "四五六".isnumeric() ) # Output: True (Chinese numerals)

If you are building a multilingual application that handles different numeric systems, isnumeric is what you want.

More about numeric value in Unicode.

isdigit checks for decimal digits including typographic variants:

print( "①".isdigit() ) # Output: True
print( "⒈".isdigit() ) # Output: True
print( "¹".isdigit() ) # Output: True

More about numerals in Unicode.

isidentifier checks whether a string is a valid Python identifier. This is useful when you need to validate a variable name programmatically:

print( "hello".isidentifier() ) # Output: True
print( "123hello".isidentifier() ) # Output: False
print( "\t".isidentifier() ) # Output: False
print( "hello123".isidentifier() ) # Output: True

isprintable checks if every character in the string is printable. Printable characters include:

Letters from A-Z (Uppercase)
Letters from a-z (Lowercase)
Digits from 0-9
Punctuation characters ( !"#$%&'()*+, -./:;?@[]^_`{ | }~ )
Space.

print( "Hello World".isprintable() ) # Output: True
print( "أنا أتحدّث العربيّة".isprintable() ) # Output: True
print( "123".isprintable() ) #Output: True
print("Hello\nWorld".isprintable() ) # Output: False
print("Hello\r\tWorld".isprintable() ) # Output: False

Join a sequence of elements by a separator

join concatenates all items in a list, tuple, or set using a separator:

# All these statements produce A,B,C string
print( ",".join(["A", "B", "C"]) ) # Using lists
print( ",".join(("A", "B", "C")) ) # Using tuples
print( ",".join({"A", "B", "C"}) ) # Using dictionaries

Create a list from a given string using a separator

split breaks a string into a list using a separator:

# Output: ['Baghdad', ' Basra', ' Anbar', ' Erbil']
print( "Baghdad, Basra, Anbar, Erbil".split(",") )

By default, split splits on every occurrence. Pass a second argument to limit the number of splits:

# ['Baghdad', ' Basra', ' Anbar, Erbil']
print( "Baghdad, Basra, Anbar, Erbil".split(",", 2) )

Use rsplit() to split from the right side.

splitlines converts line breaks into a list:

# Output: ['Hello World', 'I love Python!']
print("Hello World\nI love Python!".splitlines())

Formatting

String formatting makes your code more readable. Compare this:

name = input("Enter your name? ")
age = input("How old are you? ")
print("Hello " + name + " , you are " + age + " years old")

With the format() method:

print("Hello {name}, you are {age} years old.".format(name=name, age=age))

You can omit the named placeholders:

print("Hello {}, you are {} years old.".format(name,age))

You can specify a conversion type with a colon:

# Format number 123456 to hex.
print("{name:x}".format(name=123456)) # Output: 1e240

Available conversion types:

Conversion	Meaning
d	Signed integer decimal.
i	Signed integer decimal.
o	Unsigned octal.
u	Unsigned decimal.
x	Unsigned hexadecimal (lowercase).
X	Unsigned hexadecimal (uppercase).
e	Floating point exponential format (lowercase).
E	Floating point exponential format (uppercase).
f	Floating point decimal format.
F	Floating point decimal format.
g	Same as "e" if exponent is greater than -4 or less than precision, "f" otherwise.
G	Same as "E" if exponent is greater than -4 or less than precision, "F" otherwise.
c	Single character (accepts integer or single character string).
r	String (converts any python object using `repr()`).
s	String (converts any python object using `str()`).
%	No argument is converted, results in a "%" character in the result.

The r and s conversions deserve special attention.

Python has dunder methods (double underscore methods) like __str__ and __repr__ that extend classes with custom behavior.

By default, printing a class returns its identifier:

class Person(object): pass
print( Person ) # Ouput: <class '__main__.MyClass'>

Using __str__ or __repr__, you control what gets returned:

class Person(object):
    def __init__(self, name, age):
      self.name, self.age = name, age

    def __str__(self):
        return "Your name is {name}, and you are {age} years old".format(name=self.name, age=self.age)

print( Person("Ahmad", "32") ) # Output: Your name is Ahmad, and you are 32 years old

With r and s conversions, you can format any object that implements __repr__ or __str__:

class Ahmad():
    def __str__(self):
        return "Ahmad";

print( "My name is {name}".format(name=Ahmad()) )

If you reference the class name without parentheses, Python returns the class name instead of calling __str__.

__repr__ adds single quotes to the output and is more precise for number representation. __str__ does not.

__str__ is used far more often. Prefer it whenever you need a string representation of an object.

Tab expansion

The expandtabs method controls the width of tab characters (\t):

print( "Hello\tWorld".expandtabs(16) ) # Output: Hello   World

Partitioning strings

partition splits a string into three parts: the portion before the separator, the separator itself, and the portion after it.

# Separate Hello World by a space
greeting = "Hello World".partition(" ")

# This outputs a tuple containing three parts
print(greeting)

# Since it's a tuple we can use the unpacking feature
hello, _, world = greeting

print(hello) # Output: Hello
print(world) # Output: World

Finding substring

find and index locate the first occurrence of a substring and return its position as an integer:

python = """
Python is powerful... and fast;
plays well with others;
runs everywhere;
is friendly & easy to learn;
is Open.
"""

print( python.find("Python") ) # Output: 0
print( python.find("like") ) # Output: -1 (not found)
print( python.find("is", 31) ) # Output: 76 (start from)
print( python.find("is", 79, len(python)) ) # Output: 106 (start and end)

index behaves identically to find except it raises a ValueError when the substring is not found.

Use rfind() or rindex() to search from the right side of the string.

Replacing

replace() performs string replacement:

print( "Hello World".replace("Hello", "Hi") ) # Output: Hi World

The third argument limits the number of replacements:

print( "Hello World, Hello, Hello".replace("Hello", "Hi", 1) ) # Output: Hi World, Hello, Hello

Summary

Python 3 strings are Unicode sequences -- they support any script and are always immutable.
The bytes type stores raw binary data as a sequence of bytes, not Unicode code points. Use encode() and decode() to convert between the two.
String operators include concatenation (+), repetition (*), slicing ([], [from:to]), membership (in, not in), and formatting (%).
The is* methods are boolean checks for string properties; isdecimal, isdigit, and isnumeric differ by their Unicode classification scope.
format() is the preferred formatting approach over % placeholders, and it integrates with __str__ and __repr__ dunder methods.
find vs index -- both locate substrings, but index raises an exception on failure while find returns -1.
Full documentation for all string methods is available on the Python documentation page.

#Strings in Python

#How about bytes?

#String operators

#String methods

#is methods

#Join a sequence of elements by a separator

#Create a list from a given string using a separator

#Formatting

#Tab expansion

#Partitioning strings

#Finding substring

#Replacing

#Summary