python如何使用正则表达式的前向、后向搜索及前向搜索否定模式详解

前言

在许多的情况下，很多要匹配内容是一起出现，或者一起不出现的。比如《》，< >，这样的括号，不存在使用半个的情况。因此，在正则表达式里也有一致性的判断，要么两个尖括号一起出现，要么一个也不要出现。怎么样来实现这种判断呢？针对这种情况得引入新的正则表达式的语法：(?=pattern)，这个语法它会向前搜索或者向后搜索相关内容，如果不会出现就不能匹配。不过，这个匹配不会消耗任何输入的字符，它只是查看一下。

例子如下：

#python 3.6 
#蔡军生 
#http://blog.csdn.net/caimouse/article/details/51749579 
# 
import re 

address = re.compile( 
 ''''' 
 # A name is made up of letters, and may include "." 
 # for title abbreviations and middle initials. 
 ((?P<name> 
  ([w.,]+s+)*[w.,]+ 
  ) 
  s+ 
 ) # name is no longer optional 

 # LOOKAHEAD 
 # Email addresses are wrapped in angle brackets, but only 
 # if both are present or neither is. 
 (?= (<.*>$)  # remainder wrapped in angle brackets 
  | 
  ([^<].*[^>]$) # remainder *not* wrapped in angle brackets 
  ) 

 <? # optional opening angle bracket 

 # The address itself: username@domain.tld 
 (?P<email> 
  [wd.+-]+  # username 
  @ 
  ([wd.]+.)+ # domain name prefix 
  (com|org|edu) # limit the allowed top-level domains 
 ) 

 >? # optional closing angle bracket 
 ''', 
 re.VERBOSE) 

candidates = [ 
 u'First Last <first.last@example.com>', 
 u'No Brackets first.last@example.com', 
 u'Open Bracket <first.last@example.com', 
 u'Close Bracket first.last@example.com>', 
] 

for candidate in candidates: 
 print('Candidate:', candidate) 
 match = address.search(candidate) 
 if match: 
  print(' Name :', match.groupdict()['name']) 
  print(' Email:', match.groupdict()['email']) 
 else: 
  print(' No match')

结果输出如下：

Candidate: First Last <first.last@example.com>
 Name : First Last
 Email: first.last@example.com
Candidate: No Brackets first.last@example.com
 Name : No Brackets
 Email: first.last@example.com
Candidate: Open Bracket <first.last@example.com
 No match
Candidate: Close Bracket first.last@example.com>
 No match

python里使用正则表达式的前向搜索否定模式

上面学习前向搜索或后向搜索模式(?=pattern)，这个模式里看到有等于号=，它是表示一定相等，其实前向搜索模式里，还有不相等的判断。比如你需要识别EMAIL地址：noreply@example.com，这个EMAIL地址大多数是不需要回复的，所以我们要把这个EMAIL地址识别出来，并且丢掉它。怎么办呢？这时你就需要使用前向搜索否定模式，它的语法是这样：(?!pattern)，这里的感叹号就是表示非，不需要的意思。比如遇到这样的字符串：noreply@example.com，它会判断noreply@是否相同，如果相同，就丢掉这个模式识别，不再匹配。

例子如下：

#python 3.6 
#蔡军生 
#http://blog.csdn.net/caimouse/article/details/51749579 
# 
import re 

address = re.compile( 
 ''''' 
 ^ 

 # An address: username@domain.tld 

 # Ignore noreply addresses 
 (?!noreply@.*$) 

 [wd.+-]+  # username 
 @ 
 ([wd.]+.)+ # domain name prefix 
 (com|org|edu) # limit the allowed top-level domains 

 $ 
 ''', 
 re.VERBOSE) 

candidates = [ 
 u'first.last@example.com', 
 u'noreply@example.com', 
] 

for candidate in candidates: 
 print('Candidate:', candidate) 
 match = address.search(candidate) 
 if match: 
  print(' Match:', candidate[match.start():match.end()]) 
 else: 
  print(' No match')

结果输出如下：

Candidate: first.last@example.com
 Match: first.last@example.com
Candidate: noreply@example.com
 No match

总结

以上就是这篇文章的全部内容了，希望本文的内容对大家的学习或者工作具有一定的参考学习价值，如果有疑问大家可以留言交流，谢谢大家对毛票票的支持。

python如何使用正则表达式的前向、后向搜索及前向搜索否定模式详解

日历

标签

搜索

最新文章

热门文章

python如何使用正则表达式的前向、后向搜索及前向搜索否定模式详解

热门推荐

日历

标签

搜索

最新文章

热门文章