RegexLearn and other RegEx resources



RegexLearn is an intuitive online playground where you will learn how to construct regular expressions. We also revisit other tools, advanced regex constructs, the portability of the regex programming language, and how to deter regular expression denial of service attacks.

I won’t go into the details of why you should know about regex as a developer. I’ll just say that:

Regex can be used in programming languages ​​such as Phyton, SQL, Javascript, R, Google Analytics, Google Data Studio, and throughout the coding process to find, match, and edit text.

Despite this age of multimedia, text is still king; data in humble Excel spreadsheets at ETL, NLP and business intelligence, it’s all text.

The other point is that regular expressions are notoriously difficult to master. Of course, things have improved since the era of reading about them in books such as The Timeless Mastering regular expressions by Jeffrey Friedl or most recent Learn regular expressions by Ben Forta and the process of learning about them has become easier with the advent of tools such as RegexLearn. Of course, among the tools themselves, the level of difficulty or target group they are addressed to varies.

So there are tools such as The Perl Regex tester, Regexr, Regex101, which are aimed at a more advanced audience and there are others more sympathetic like Regexplained, Ihateregex or office Regex trainer, but all primarily allow the user to test regular expressions against a piece of text in order to hide them and learn by testing them. None of them provide the step-by-step educational approach taken by RegexLearn.

Aside from the tools, there have also been other attempts to bend the complexity of expressions by adopting scientific solutions such as genetic programming which I have examined with “Automatically generate regular expressions with genetic programming”, or with a new language, the Simple Regex language, discussed in Taming Regular Expressions.

Coming back to RegexLearn, simplicity is what it offers. You just type what the instruction tells you, and as the result you see the corresponding text, learn to use the handy operator.

It starts up slowly and very simply by simply typing OK in the RegEx field to go to the next step.

Step 2 explains why learning regex is useful:

let’s say you have a list of filenames. And you only want to find files with the pdf extension. After typing an expression ^ w + . pdf $ will work.

and just press Next to continue.

In step 3, you discover the Dot. : Any character

The period . allows you to select any character, including special characters and spaces.

Then it moves on to character sets and so on, with each step increasing in difficulty. If you’re stuck, no worries; Alt + H will show you the answer. In total there are 55 steps so a good depth is covered.

Of course, in the end you won’t learn some very advanced constructs that are probably specific to programming language such as Perl’s The Pattern Code Expression. ?{?coded} or extended constructions ? {code}. For examples of this advanced usage, see the links.

When it comes to language-specific regex extensions, the question arises: Can regular expressions be safely reused in multiple languages? I.e., can I reuse one? regular expression created in JavaScript verbatim in Python? Doing so will I get the same results and performance? This article reviews the research, which also includes safety precautions discussed by

PHP and Perl, PHP probably because it uses the PCRE (Perl Compatible Regular Expressions) library, were the only ones with explicit defenses against exponential temporal behavior.

Another reason you need to really know your regular expressions well to avoid DoS regular expression attacks is exponential temporal behavior.

The denial of service (ReDoS) regular expression is a denial of service attack, which exploits the fact that most regular expression implementations can reach extreme situations that cause them to run very slowly (exponentially depending on the size of the entry). An attacker can therefore cause a program using a regular expression to enter such extreme situations, and then crash for a very long time.

Due to differences in the underlying algorithms that regular expression engines are based on, a match in some languages ​​may require longer than linear time (polynomial or exponential at worst) in the length of the regular expression and the input string. These are called super-linear matches and some regex engines fall prey to this super-linear behavior while the wiser ones avoid it.

So, regular expressions that fall into this super-linear category can be exploited by being fed by specially crafted strings that would subsequently overload the host, i.e. the web server, as in a DoS attack, finally bringing him to his knees.

Something to take into account. And because this is a huge issue, we also reported a tool that can identify resource-intensive regular expressions, see Regexploit. While this is a scenario that shouldn’t bother RegexLearn users, it’s good to know that it exists.

Once you’ve completed all of the RegexLearn steps, it’s time to test your new skills. Although RegLearn promises a Practice section, it is not yet ready. However, you can train with the regular expressions set from Machine Learning Lab, the creators we met in Automatic Generation of Regular Expressions with Genetic Programming, in a game that includes 12 levels of increasing difficulty. And if you want an older approach, check out Can You Do the Regular Expression Crosswords ?.

A late addition to the list of tools is py_regular_expressions, a graphics application written in tkinter to help you practice Python regular expressions.

RegexLearn’s infrastructure is also open source and can be found on its GitHub repository.

More information



Related Articles

Learn Regular Expressions (Book Review)

Automatically generate regular expressions with genetic programming

Getting to grips with regular expressions

The model code expression

Extended designs

Machine Learning Lab Regular Expression Game

Can you do the regular expressions crossword?

To be informed of new articles on I Programmer, subscribe to our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook Where Linkedin.




or send your comment to: [email protected]



Comments are closed.