Python regex to javascript

Converting Python Regex to JS Regex: A Duplicate Question

The regex needs to function in the following manner: How can a json path string be corrected and replaced using regex? The majority of the code can be eliminated except for a small amount of JavaScript code. Upon examining the raw HTML code, I discovered that the JavaScript code that cannot be removed appears in the following format. I assumed that the reason this code could not be removed is that it was written in uppercase, like this: «.

Regex in python for filter JS code

As a newcomer to Python, I aim to utilize regex for filtering HTML tags. To achieve this, I have implemented the following function:

 def filter_tags(htmlstr): re_cdata=re.compile('//',re.DOTALL) re_script=re.compile('<\s*script[^>]*>[^',re.DOTALL)#Script re_style=re.compile('<\s*style[^>]*>[^',re.I)#style re_br=re.compile('') re_h=re.compile(']*>') re_function = re.compile('') re_comment=re.compile('') s=re_cdata.sub('',htmlstr) s=re_script.sub('',s) s=re_style.sub('',s) s=re_br.sub('',s) s=re_h.sub('',s) s=re_comment.sub('',s) s = re.sub('\\t','',s) s = re.sub(' ','',s) return s 

It is possible to eliminate the majority of tags and codes, however, some js functions cannot be removed. I’ve encountered difficulties in this regard.

(function()< NTES.ajax.importJs('http://news.163.com/special/hot_tags_recommend_data/',function()< varname1,name2,len1,len2,width1,width2,left2; varloveData=['拎婚房待嫁北京爷们','请网友鉴定是否美女']; if(hotTagsData.count&&hotTagsData.count>0)< varcode='#from=article', html=[], item=; for(vari=0;i html.push(lovedata[0]); html.push(lovedata[1]); ntes('#js-extrataglist').innerhtml="html.join('');" len1="name1.replace(/[^\x00-\xff]/gi,"aa").length;" len2="name2.replace(/[^\x00-\xff]/gi,"aa").length;" width1="Math.floor((len1/(len1+len2))*271);" width2="271-width1;" left2="96+width1+19;" ntes('.extra-tag-1').addcss('width:'+width1+'px'); ntes('.extra-tag-2').addcss('width:'+width2+'px;left:'+left2+'px;'); > >,'gbk'); >)(); 

There exist numerous functions similar to this one. Is it possible to eliminate them using regex? I appreciate your assistance.

Avoid using the [^ \]\*\>\[\^\\ \ \ \ . It should only be utilized for matching tags themselves. Instead, opt for the non-greedy * , which is commonly represented by *? . This will change the appearance of \ \ \ \ \ \\]\*\>\.\*\?\\ \ \ \ as well. Make sure to modify this in all instances, including the comment regex and style tags .

This solution should cover most situations, but it does not provide immunity against tags containing the string ‘‘ . Although rare, if you come across such cases, you may need to handle them manually.

With the help of DataHerder’s answer, I was able to resolve the issue by modifying my regular expression. This resulted in the removal of most of the code, but not all of it, as a small portion of code, specifically js code , remained. Upon inspecting the raw HTML code, I discovered that the remaining JavaScript code appeared in a certain manner.

Initially, I believed that the code couldn’t be eliminated due to it being written in uppercase letters, as shown by . As a result, I made slight modifications to my regular expression and now I am able to filter out all the tags and codes effectively. Thank you once again for your assistance. Here is the updated regex:

re_cdata=re.compile('//',re.DOTALL) re_script=re.compile('<\s*script[^>]*>.*?',re.DOTALL|re.I) re_style=re.compile('<\s*style[^>]*>.*?',re.DOTALL|re.I) re_br=re.compile('') re_h=re.compile('',re.DOTALL) re_comment=re.compile('',re.DOTALL) 

The re.I function is employed to find uppercase characters.

Simple regex (python vs javascript), What am I doing wrong? I am searching the matchs in a url, for example import re pattern = ‘github’ str = ‘https//github.com’ x

Parsing JS with regex python

I need to parse some JavaScript text using Python, which includes an HTML element variable in the JS code.

this.products = ko.observableArray([#here is some json, #here is some json]) 

The variable observablearray is capable of storing one JSON object as either an observableArray() or an observableArray([‘id’: ‘ 3123123 ‘]). Additionally, it can hold an unlimited number of JSON objects separated by commas, as shown in the previously posted code.

I have attempted to obtain this string containing JSONs using regex.

regex = re.compile('\n^(.*?)=(.*?)$|,',) js_text = re.findall(regex, js) print(js_text) 

In line 177 of the «re.py» file located in the «/usr/lib/python2.7» directory, the function «findall» is defined. This function returns the results of calling the «findall» method on an object obtained by compiling the provided regex pattern and applying it to a given string or buffer, depending on the input arguments.

It seems that js is not a valid string or buffer. However, have you verified that js is indeed a string or buffer?

# no problem >>> js = "this.products = ko.observableArray()" >>> js_text = re.findall(regex, js) >>> print(js_text) [] # argument is not a string nor a buffer (in this case None) >>> js_text = re.findall(regex, None) Traceback (most recent call last): File "", line 1, in File "/home/mhawke/virtualenvs/urllib3/lib64/python2.7/re.py", line 177, in findall return _compile(pattern, flags).findall(string) TypeError: expected string or buffer >>> js_text = re.findall(regex, js) >>> print(js_text) [] 

Just a friendly reminder, it’s more polite to refer to the code as regex.findall(js) instead.

Your regex pattern is also problematic, but in a distinct way.

How do I make an anchored regex match in the middle of a string in, You may be after a false efficiency here. It might be much quicker to just do a manual comparison starting at the desired index and NOT use a

Perform Simple Javascript Regex in Python

What is the process to execute javascript regex in Python (3.2)?

exmple_string.replace(/-/g, '.').replace(/(^|\.)(1+)($|\.)/g, '[$2]$3'); 

The regex should work as following:

What is the process for substituting the path string with a regular expression in relation to correct json replacement?

Although I experimented with the re library, I am unsure about the proper method of executing the periods for $2$ and $3$.

Utilize the re library which can be found in the documentation for Python version 3.2 at https://docs.python.org/3.2/library/re.html.

import re value = "55-fathers-2-married" value = value.replace("-", ".") re.sub(r"(^|\.)(3+)($|\.)", r"[\2]\3", value) 

Evaluate your regular expression utilizing the python regex utility accessible at http://www.pythonregex.com/.

Python re match equivalent in javascript [duplicate], You may use the match function. > var str = «My name is Derek Last Name» undefined > str.match(/My name is (.+)/)[1] ‘Derek Last Name’.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Converts Python-like (re) regular expressions to JavaScript RegExp instances

License

jmchilton/pyre-to-regexp

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Converts Python-like (re) regular expressions to JavaScript RegExp instances

This project is a fork of the MIT licensed pcre-to-regexp project from @TooTallNate. This fork is also MIT licensed.

Creates a JavaScript RegExp instance from a Python-like regexp string.

Works with Node.js and in the browser via a CommonJS bundler like browserify.

pyre(String pattern[, Array keys]) → RegExp

Returns a JavaScript RegExp instance from the given Python-like regular expression string.

An empty array may be passed in as the second argument, which will be populated with the «named capture group» names as Strings in the Array, once the RegExp has been returned.

The returned RegExp has an additional function pyreReplace , for Python-like replacements

About

Converts Python-like (re) regular expressions to JavaScript RegExp instances

Источник

js-regex

Did you know that regular expressions may vary between programming languages? For example, let’s consider the pattern «^abc$» , which matches the string «abc» . But what about the string «abc\n» ? It’s also matched in Python, but not in Javascript!

This and other slight differences can be really important for cross-language standards like jsonschema , and that’s why js-regex exists.

How it works

Internally, js_regex.compile() replaces JS regex syntax which has a different meaning in Python with whatever Python regex syntax has the intended meaning.

This only works for the .search() method — there is no equivalent to .match() or .fullmatch() for Javascript regular expressions.

We also check for constructs which are valid in Python but not JS — such as named capture groups — and raise an explicit error. Constructs which are valid in JS but not Python may also raise an error, because we’re still using Python’s re.compile() function under the hood!

The following table is adapted from this larger version, ommiting other languages and any rows where JS and Python have the same behaviour.

Feature Javascript Python Handling
\a (bell) no yes Converted to JS behaviour
\ca — \cz and \cA — \cZ (control characters) yes no Converted to JS behaviour
\d for digits, \w for word chars, \s for whitespace ascii unicode Converted to JS behaviour (including \D , \W , \S for negated classes)
$ (end of line/string) at end allows trailing \n Converted to JS behaviour
\A (start of string) no yes Explicit error, use ^ instead
\Z (end of string) no yes Explicit error, use $ instead
(? <=text) (positive lookbehind) new in ES2018 yes Allowed
(? new in ES2018 yes Allowed
(?(1)then|else) no yes Explicit error
(?(group)then|else) no yes Explicit error
(?#comment) no yes Explicit error
(?Pregex) (Python named capture group) no yes Not detected (yet)
(?P=name) (Python named backreference) no yes Not detected (yet)
(?regex) (JS named capture group) new in ES2018 no Error from Python, not translated (yet)
$ (JS named backreference) new in ES2018 no Error from Python, not translated (yet)
(?i) (case insensitive) /i only yes Explicit error, compile with flags=re.IGNORECASE instead
(?m) ( ^ and $ match at line breaks) /m only yes Explicit error, compile with flags=re.MULTILINE instead
(?s) (dot matches newlines) no yes Explicit error, compile with flags=re.DOTALL instead
(?x) (free-spacing mode) no yes Explicit error, there is no corresponding mode in Javascript
Backreferences non-existent groups are an error no yes Follows Python behaviour
Backreferences to failed groups also fail no yes Follows Python behaviour
Nested references \1 through \9 yes no Follows Python behaviour

Note that in many cases Python-only regex features would be treated as part of an ordinary pattern by JS regex engines. Currently we raise an explicit error on such inputs, but may translate them to have the JS behaviour in a future version.

Changelog

1.0.1 — 2019-10-17

  • Allow use of native strings on Python 2. This is not actually valid according to the spec, but it’s only going to be around for a few months so whatever.

1.0.0 — 2019-10-04

  • Now considered feature-complete and stable, as all constructs recommended for jsonschema patterns are supported and all Python-side incompatibilities are detected.
  • Compiled patterns are now cached on Python 3, exactly as for re.compile

0.4.0 — 2019-10-03

0.3.0 — 2019-09-30

  • Fixed handling of non-trailing $ , e.g. in «^abc$|^def$» both are converted
  • Added explicit errors for re.LOCALE and re.VERBOSE flags, which have no JS equivalent
  • Added explicit checks and errors for use of Python-only regex features

0.2.0 — 2019-09-28

Convert JS-only syntax to Python equivalent wherever possible.

0.1.0 — 2019-09-28

Initial release, with project setup and a very basic implementation.

Источник

Читайте также:  Java file merge tool
Оцените статью