- Python Raw Strings
- Introduction to the Python raw strings
- Use raw strings to handle file path on Windows
- Convert a regular string into a raw string
- Summary
- Строковые префиксы «u» и «r» в Python и что такое «сырые» строковые литералы
- Префикс «u»
- Префикс «r»
- Сочетание «ur»
- Преобразование обратно в «сырую» строку
- UTF-8 и префикс «u»
- Prefix r before String in Python
- Prefix r Before String in Python
- Further reading:
- What is R String in Python?
- What is a “Raw” String?
- How to Use Python Raw Strings?
- Invalid Raw Strings in Python
- Conclusion
- About the author
- Talha Saif Malik
Python Raw Strings
Summary: in this tutorial, you will learn about Python raw strings and how to use them to handle strings that treat backslashes as literal characters.
Introduction to the Python raw strings
In Python, when you prefix a string with the letter r or R such as r’. ‘ and R’. ‘ , that string becomes a raw string. Unlike a regular string, a raw string treats the backslashes ( \ ) as literal characters.
Raw strings are useful when you deal with strings that have many backslashes, for example, regular expressions or directory paths on Windows.
To represent special characters such as tabs and newlines, Python uses the backslash ( \ ) to signify the start of an escape sequence. For example:
s = 'lang\tver\nPython\t3' print(s)
Code language: Python (python)
lang ver Python 3
Code language: Python (python)
However, raw strings treat the backslash ( \ ) as a literal character. For example:
s = r'lang\tver\nPython\t3' print(s)
Code language: Python (python)
lang\tver\nPython\t3
Code language: Python (python)
A raw string is like its regular string with the backslash ( \ ) represented as double backslashes ( \\ ):
s1 = r'lang\tver\nPython\t3' s2 = 'lang\\tver\\nPython\\t3' print(s1 == s2) # True
Code language: Python (python)
In a regular string, Python counts an escape sequence as a single character:
s = '\n' print(len(s)) # 1
Code language: Python (python)
However, in a raw string, Python counts the backslash ( \ ) as one character:
s = r'\n' print(len(s)) # 2
Code language: Python (python)
Since the backslash ( \ ) escapes the single quote ( ‘ ) or double quotes ( » ), a raw string cannot end with an odd number of backslashes.
s = r'\'
Code language: Python (python)
SyntaxError: EOL while scanning string literal
Code language: Python (python)
s = r'\\\'
Code language: Python (python)
SyntaxError: EOL while scanning string literal
Code language: Python (python)
Use raw strings to handle file path on Windows
Windows OS uses backslashes to separate paths. For example:
c:\user\tasks\new
Code language: Python (python)
If you use this path as a regular string, Python will issue a number of errors:
dir_path = 'c:\user\tasks\new'
Code language: Python (python)
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \uXXXX escape
Code language: Python (python)
Python treats \u in the path as a Unicode escape but couldn’t decode it.
Now, if you escape the first backslash, you’ll have other issues:
dir_path = 'c:\\user\tasks\new' print(dir_path)
Code language: Python (python)
c:\user asks ew
Code language: Python (python)
In this example, the \t is a tab and \n is the new line.
To make it easy, you can turn the path into a raw string like this:
dir_path = r'c:\user\tasks\new' print(dir_path)
Code language: Python (python)
Convert a regular string into a raw string
To convert a regular string into a raw string, you use the built-in repr() function. For example:
s = '\n' raw_string = repr(s) print(raw_string)
Code language: Python (python)
'\n'
Code language: Python (python)
Note that the result raw string has the quote at the beginning and end of the string. To remove them, you can use slices:
s = '\n' raw_string = repr(s)[1:-1] print(raw_string)
Code language: Python (python)
Summary
- Prefix a literal string with the letter r or R to turn it into a raw string.
- Raw strings treat backslash as a literal character.
Строковые префиксы «u» и «r» в Python и что такое «сырые» строковые литералы
В Python есть специальные префиксы, которые можно добавить перед строкой. Они сообщают интерпретатору, как обрабатывать эту строку. В частности, два таких префикса — «u» и «r».
Префикс «u»
Префикс «u» используется для обозначения юникод-строки. В Python 2.x это было важно, поскольку стандартные строки были в кодировке ASCII, а не в Unicode.
В Python 3.x все строки по умолчанию являются юникод-строками, поэтому использование префикса «u» больше не обязательно. Однако его все еще можно использовать для обратной совместимости.
Префикс «r»
Префикс «r» создает «сырую» строку. Это означает, что все символы в строке будут интерпретироваться как есть, без обработки специальных символов, таких как обратные слеши. Это особенно полезно при работе с регулярными выражениями и путями к файлам.
s = r'\n это не перенос строки'
В этом примере строка будет содержать буквально символы «\n», а не символ переноса строки.
Сочетание «ur»
Вы также можете комбинировать префиксы «u» и «r», создавая «сырую» юникод-строку.
Такая строка будет в кодировке Unicode, и все в ней будет интерпретироваться буквально, без обработки специальных символов.
Преобразование обратно в «сырую» строку
Что касается преобразования юникод-строки обратно в «сырую» строку, то нет прямого способа сделать это. Вместо этого можно использовать функцию encode , чтобы преобразовать юникод-строку в байтовую строку с использованием необходимой кодировки.
UTF-8 и префикс «u»
Наконец, если система и текстовый редактор уже используют кодировку UTF-8, префикс «u» все равно будет полезен в Python 2.x, поскольку он сообщает интерпретатору, что строка должна быть обработана как юникод-строка. В Python 3.x это не имеет значения, поскольку все строки и так являются юникод-строками.
Prefix r before String in Python
💡 Quick Definition
Prefix r before String denotes raw Strings in Python. Python raw string considers backslash (\) as a literal character. This is generally preferred when you don’t want to treat backslash character as escape character.
Here is an example:
Escape sequences provide an efficient way to convey some alternative meaning of a character to the compiler. Some common escape sequences include \n for new line, \t for horizontal tabs, and more.
Python also supports such escape sequences and the Python interpreter uses their defined meanings when it encounters such sequences.
In the above example, we use the \n sequence to specify that a new line needs to be added.
Prefix r Before String in Python
Now there might be situations where we are required to display backslashes in the string and not want them to be understood as an escape sequence.
For such situations, we use the prefix r before String in Python. This r prefix tells the interpreter that the given string is a raw string literal. Every character in such a string is taken as a string character and backslashes are not interpreted as escape sequences.**
In the above example, we use the same string we did in the earlier used code but use the prefix r before string in Python. It tells the compiler that it is a raw string literal and thus the \n escape sequence is interpreted as a string and no new line gets added to the string.
Further reading:
Escape Backslash Character in Python
Escape quotes in Python
Now, there is a distinction in using the r prefix in Python 2 and Python 3. The former considers strings as ASCII-encoded letters whereas the latter expands on the character set and considers strings as a collection of Unicode characters.
In Python 2, we use the prefix u if we wish to use Unicode characters in a string. We can use this along with the r prefix by combining them as ur . If we use the ur prefix before a string then we are specifying to the compiler that we are creating raw string literals with Unicode characters.
What is R String in Python?
Python is a flexible programming language that includes various ways to handle and manipulate strings. The raw string is one such feature of manipulating string, also referred to as an “R string“. This article explores what raw strings are and how they are used in Python using numerous examples.
The below-given contents will be covered in this Python guide:
What is a “Raw” String?
The “Raw” string in Python is a string literal that includes special characters without any dedicated meaning. The “Raw” string is created by prefixing the string literal with the letters “r” or “R” in Python. These prefixes inform Python that the string should be treated as a raw string, thereby ignoring any special meanings associated with certain characters.
By using raw strings, we can include backslashes, escape sequences, and other special characters without the need for extra escaping.
How to Use Python Raw Strings?
The following examples show the usage of Python raw strings:
Example 1: Creating a Raw String in Python
To create a raw string, simply prefix the string literal with “r” or “R”:
In the above code, the “r” prefix is used before the string to assign the stated value i.e., “path” to the raw string.
The “Raw” string has been created and returned successfully.
Example 2: Updating a Python Raw String
Raw strings, like regular strings, can be updated as well. Here, these can be updated by replacing a substring:
r_string = r «This is a \n raw string»
new_str = r_string.replace ( «raw» , «R» )
print ( r_string, ‘\n’ )
print ( new_str )
In the above code, the “Raw” string is created using the prefix “r” before the string. After that, the specified substring in the raw string is replaced using the “replace()” function.
The specified substring has been replaced in the raw string accordingly.
Example 3: Including a Newline Character in a Raw String
To include a newline character in a raw string, we can use the escape sequence “\n”:
In the above code, the “\n” character is used to add the newline character in the raw string.
Note: In raw string, the “\n” newline character will not add the new line, unlike the normal string.
Here, it can be implied that the newline character has been added in the raw string instead of adding the newline.
Example 4: Converting Regular String to a Raw Strings
Python’s “repr()” function converts regular strings into raw strings. It returns a string representation that includes the escape characters instead of their implementation:
In the above code lines, the “repr()” function takes a simple regular string as an argument and converts it into a raw string.
This output shows that the simple string contains a new line between the strings while the raw string adds the “\n” character between the two strings instead.
Invalid Raw Strings in Python
While raw strings offer flexibility, there are cases where they can be “invalid”. The invalid raw strings in Python can be identified by the following characteristics:
-
- A raw string that ends with a single backslash, such as “r’abc\’”. It is invalid because the backslash escapes the closing quote and makes the string incomplete.
- A raw string that contains an invalid escape sequence, such as “r’\x’”.
- A raw string that contains an invalid Unicode escape sequence, such as “r’\u123′”.
To avoid these, we can either escape the backslashes, use double backslashes at the end, or use normal strings with proper escaping.
Conclusion
A “Raw String” provides a convenient way to work with strings containing special characters in Python. It is constructed by prefixing the string literals with “r” or “R“. The “Raw Strings” can be used to include backslashes, escape sequences, or other special characters without having to use excessive escapes. It is possible to use them in regular expressions, file paths, and string manipulation scenarios in Python. This blog elaborated on the working of the “R” string in Python.
About the author
Talha Saif Malik
Talha is a contributor at Linux Hint with a vision to bring value and do useful things for the world. He loves to read, write and speak about Linux, Data, Computers and Technology.