Restricting Text Responses With Regular Expressions¶

Concluding updated: 15 Feb 2022

A regular expression, or regex, is a search pattern used for matching specific characters and ranges of characters within a string. It is widely used to validate, search, extract, and restrict text in virtually programming languages. KoboToolbox supports regex to control the length and characters during data entry to a particular question (e.g. decision-making the entry of mobile number to exactly 10 digits, controlling the entry of a valid electronic mail id etc.).

To use a regex in KoboToolbox, follow these steps¶

  1. Set up a Text question type.

  2. Go to the question'south Settings.

  3. Become to Validation Criteria and choose the Manually enter your validation logic in XLSForm code option.

  4. In the Validation Code box, enter your regex formula between the quotation marks (' ') of the regex(., ' ') format. For reference, the menstruum ( . ) refers to 'this question', while the regular expression within the quotation marks ( ' ' ) needs to adjust to the established regex rules.

  5. (Optional) Add together a custom Error Message for the person entering data to see when they don't come across the regex criteria.

image

Regex can besides be coded in XLSForm, nether the constraint column:

blazon

proper name

label

advent

constraint

constraint_message

text

q1

Mobile number of respondent

numbers

regex(., '^[0-9]{10}$')

This value must exist only x digits

Alternatively, you can create a calculate question type and so define the regex code under the calculation column. You lot could then use this variable as many times as needed in the survey:

blazon

proper name

label

adding

constraint

constraint_message

calculate

q0

'^[A-Z]{i}[a-z]{one,}\s[A-Z]{1}[a-z]{1,}$'

text

q1

Name of the Enumerator

regex(., ${q0})

Please use this format: Kobe Bryant

text

q2

Proper noun of the Respondent

regex(., ${q0})

Please utilize this format: Kobe Bryant

integer

q3

Age of the Respondent

How exercise I build the regex that I demand?¶

In addition to the examples and tips provided below, please visit this website for more help and examples.

Regex in KoboToolbox should ever be written in-between the apostrophes regex(., ' ') as shown in the examples.

Regex

Clarification

^

The caret symbol matches the starting time of a string without consuming whatsoever character.

$

The dollar symbol matches the terminate of a string without consuming any character.

[abc]

Matches either a , b or c from within the square brackets [ ] .

[a-z]

Matches whatever lowercase character from a to z .

[A-Z]

Matches whatever uppercase character from A to Z .

[0-9]

Matches any whole numbers from 0 to 9 .

[a-zA-Z0-9]

Matches whatsoever character from a to z or A to Z or 0 to 9 .

[^abc]

Matches whatsoever graphic symbol except a , b or c .

[^A-Z]

Matches whatever characters except those in the range A to Z .

(apple)

The group character ( ) matches anything that is inside the parenthesis.

|

A vertical bar matches any element separated.

\

A dorsum slash is used to friction match the literal value of any metacharacter (east.yard. endeavour using \. or \@ or \$ while edifice regex).

\number

Matches the aforementioned character as almost recently matched by the northwardth (number used) capturing grouping.

\s

Matches any space or tab.

\b

Matches, without consuming whatever characters immediately between a character matched past \due west and a grapheme non matched past \w (in either order). \b is as well known equally the word boundary.

\d

Matches any equivalent numbers [0-9]

\D

Matches annihilation other than numbers (0 to 9) .

\westward

Matches whatever discussion character (i.e. a to z or A to Z or 0 to nine or _ ).

\West

Matches anything other than what \w matches (i.e. information technology matches wild cards and spaces).

?

A question marking used merely behind a graphic symbol matches or skips (if non required) a graphic symbol match.

*

An asterisk symbol used simply behind a character matches aught or more consecutive character.

+

The plus symbol used just behind a graphic symbol matches i or more consecutive character.

{ten}

Matches exactly x consecutive characters.

{x,}

Matches at least 10 consecutive characters (or more).

{x,y}

Matches betwixt x and y consecutive characters.

Considerations when using regex¶

  • If you wish to use a regex constraint on a number in a text blazon question, make sure you e'er have the value numbers under the appearance column. This restricts the display of alphabets, making only numbers visible for inputs.

  • The Collect Android app and Enketo comport differently with their treatment of regex expressions. Collect behaves as if you have used the anchors ^ and $ around the expression (even if yous accept not used them), while Enketo requires the anchors as mandatory for an verbal match.