- Converting Python Regex to JS Regex: A Duplicate Question
- Regex in python for filter JS code
- Parsing JS with regex python
- Perform Simple Javascript Regex in Python
- Saved searches
- Use saved searches to filter your results more quickly
- License
- jmchilton/pyre-to-regexp
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- js-regex
- How it works
- Changelog
- 1.0.1 — 2019-10-17
- 1.0.0 — 2019-10-04
- 0.4.0 — 2019-10-03
- 0.3.0 — 2019-09-30
- 0.2.0 — 2019-09-28
- 0.1.0 — 2019-09-28
Converting Python Regex to JS Regex: A Duplicate Question
The regex needs to function in the following manner: How can a json path string be corrected and replaced using regex? The majority of the code can be eliminated except for a small amount of JavaScript code. Upon examining the raw HTML code, I discovered that the JavaScript code that cannot be removed appears in the following format. I assumed that the reason this code could not be removed is that it was written in uppercase, like this: «.
Regex in python for filter JS code
As a newcomer to Python, I aim to utilize regex for filtering HTML tags. To achieve this, I have implemented the following function:
def filter_tags(htmlstr): re_cdata=re.compile('//',re.DOTALL) re_script=re.compile('<\s*script[^>]*>[^',re.DOTALL)#Script re_style=re.compile('<\s*style[^>]*>[^',re.I)#style re_br=re.compile('') re_h=re.compile(']*>') re_function = re.compile('') re_comment=re.compile('') s=re_cdata.sub('',htmlstr) s=re_script.sub('',s) s=re_style.sub('',s) s=re_br.sub('',s) s=re_h.sub('',s) s=re_comment.sub('',s) s = re.sub('\\t','',s) s = re.sub(' ','',s) return s
It is possible to eliminate the majority of tags and codes, however, some js functions cannot be removed. I’ve encountered difficulties in this regard.
(function()< NTES.ajax.importJs('http://news.163.com/special/hot_tags_recommend_data/',function()< varname1,name2,len1,len2,width1,width2,left2; varloveData=['拎婚房待嫁北京爷们','请网友鉴定是否美女']; if(hotTagsData.count&&hotTagsData.count>0)< varcode='#from=article', html=[], item=; for(vari=0;i" if(i="=2)" > html.push(lovedata[0]); html.push(lovedata[1]); ntes('#js-extrataglist').innerhtml="html.join('');" len1="name1.replace(/[^\x00-\xff]/gi,"aa").length;" len2="name2.replace(/[^\x00-\xff]/gi,"aa").length;" width1="Math.floor((len1/(len1+len2))*271);" width2="271-width1;" left2="96+width1+19;" ntes('.extra-tag-1').addcss('width:'+width1+'px'); ntes('.extra-tag-2').addcss('width:'+width2+'px;left:'+left2+'px;'); > >,'gbk'); >)();
There exist numerous functions similar to this one. Is it possible to eliminate them using regex? I appreciate your assistance.
Avoid using the [^ \]\*\>\[\^\\ \ \ \ . It should only be utilized for matching tags themselves. Instead, opt for the non-greedy * , which is commonly represented by *? . This will change the appearance of \ \ \ \ \ \\]\*\>\.\*\?\\ \ \ \ as well. Make sure to modify this in all instances, including the comment regex and style tags .
This solution should cover most situations, but it does not provide immunity against tags containing the string ‘‘ . Although rare, if you come across such cases, you may need to handle them manually.
With the help of DataHerder’s answer, I was able to resolve the issue by modifying my regular expression. This resulted in the removal of most of the code, but not all of it, as a small portion of code, specifically js code , remained. Upon inspecting the raw HTML code, I discovered that the remaining JavaScript code appeared in a certain manner.
Initially, I believed that the code couldn’t be eliminated due to it being written in uppercase letters, as shown by . As a result, I made slight modifications to my regular expression and now I am able to filter out all the tags and codes effectively. Thank you once again for your assistance. Here is the updated regex:
re_cdata=re.compile('//',re.DOTALL) re_script=re.compile('<\s*script[^>]*>.*?',re.DOTALL|re.I) re_style=re.compile('<\s*style[^>]*>.*?',re.DOTALL|re.I) re_br=re.compile('') re_h=re.compile('',re.DOTALL) re_comment=re.compile('',re.DOTALL)
The re.I function is employed to find uppercase characters.
Simple regex (python vs javascript), What am I doing wrong? I am searching the matchs in a url, for example import re pattern = ‘github’ str = ‘https//github.com’ x
Parsing JS with regex python
I need to parse some JavaScript text using Python, which includes an HTML element variable in the JS code.
this.products = ko.observableArray([#here is some json, #here is some json])
The variable observablearray is capable of storing one JSON object as either an observableArray() or an observableArray([‘id’: ‘ 3123123 ‘]). Additionally, it can hold an unlimited number of JSON objects separated by commas, as shown in the previously posted code.
I have attempted to obtain this string containing JSONs using regex.
regex = re.compile('\n^(.*?)=(.*?)$|,',) js_text = re.findall(regex, js) print(js_text)
In line 177 of the «re.py» file located in the «/usr/lib/python2.7» directory, the function «findall» is defined. This function returns the results of calling the «findall» method on an object obtained by compiling the provided regex pattern and applying it to a given string or buffer, depending on the input arguments.
It seems that js is not a valid string or buffer. However, have you verified that js is indeed a string or buffer?
# no problem >>> js = "this.products = ko.observableArray()" >>> js_text = re.findall(regex, js) >>> print(js_text) [] # argument is not a string nor a buffer (in this case None) >>> js_text = re.findall(regex, None) Traceback (most recent call last): File "", line 1, in File "/home/mhawke/virtualenvs/urllib3/lib64/python2.7/re.py", line 177, in findall return _compile(pattern, flags).findall(string) TypeError: expected string or buffer >>> js_text = re.findall(regex, js) >>> print(js_text) []
Just a friendly reminder, it’s more polite to refer to the code as regex.findall(js) instead.
Your regex pattern is also problematic, but in a distinct way.
How do I make an anchored regex match in the middle of a string in, You may be after a false efficiency here. It might be much quicker to just do a manual comparison starting at the desired index and NOT use a
Perform Simple Javascript Regex in Python
What is the process to execute javascript regex in Python (3.2)?
exmple_string.replace(/-/g, '.').replace(/(^|\.)(1+)($|\.)/g, '[$2]$3');
The regex should work as following:
What is the process for substituting the path string with a regular expression in relation to correct json replacement?
Although I experimented with the re library, I am unsure about the proper method of executing the periods for $2$ and $3$.
Utilize the re library which can be found in the documentation for Python version 3.2 at https://docs.python.org/3.2/library/re.html.
import re value = "55-fathers-2-married" value = value.replace("-", ".") re.sub(r"(^|\.)(3+)($|\.)", r"[\2]\3", value)
Evaluate your regular expression utilizing the python regex utility accessible at http://www.pythonregex.com/.
Python re match equivalent in javascript [duplicate], You may use the match function. > var str = «My name is Derek Last Name» undefined > str.match(/My name is (.+)/)[1] ‘Derek Last Name’.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Converts Python-like (re) regular expressions to JavaScript RegExp instances
License
jmchilton/pyre-to-regexp
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Converts Python-like (re) regular expressions to JavaScript RegExp instances
This project is a fork of the MIT licensed pcre-to-regexp project from @TooTallNate. This fork is also MIT licensed.
Creates a JavaScript RegExp instance from a Python-like regexp string.
Works with Node.js and in the browser via a CommonJS bundler like browserify.
pyre(String pattern[, Array keys]) → RegExp
Returns a JavaScript RegExp instance from the given Python-like regular expression string.
An empty array may be passed in as the second argument, which will be populated with the «named capture group» names as Strings in the Array, once the RegExp has been returned.
The returned RegExp has an additional function pyreReplace , for Python-like replacements
About
Converts Python-like (re) regular expressions to JavaScript RegExp instances
js-regex
Did you know that regular expressions may vary between programming languages? For example, let’s consider the pattern «^abc$» , which matches the string «abc» . But what about the string «abc\n» ? It’s also matched in Python, but not in Javascript!
This and other slight differences can be really important for cross-language standards like jsonschema , and that’s why js-regex exists.
How it works
Internally, js_regex.compile() replaces JS regex syntax which has a different meaning in Python with whatever Python regex syntax has the intended meaning.
This only works for the .search() method — there is no equivalent to .match() or .fullmatch() for Javascript regular expressions.
We also check for constructs which are valid in Python but not JS — such as named capture groups — and raise an explicit error. Constructs which are valid in JS but not Python may also raise an error, because we’re still using Python’s re.compile() function under the hood!
The following table is adapted from this larger version, ommiting other languages and any rows where JS and Python have the same behaviour.
Feature | Javascript | Python | Handling |
---|---|---|---|
\a (bell) | no | yes | Converted to JS behaviour |
\ca — \cz and \cA — \cZ (control characters) | yes | no | Converted to JS behaviour |
\d for digits, \w for word chars, \s for whitespace | ascii | unicode | Converted to JS behaviour (including \D , \W , \S for negated classes) |
$ (end of line/string) | at end | allows trailing \n | Converted to JS behaviour |
\A (start of string) | no | yes | Explicit error, use ^ instead |
\Z (end of string) | no | yes | Explicit error, use $ instead |
(? <=text) (positive lookbehind) | new in ES2018 | yes | Allowed |
(? | new in ES2018 | yes | Allowed |
(?(1)then|else) | no | yes | Explicit error |
(?(group)then|else) | no | yes | Explicit error |
(?#comment) | no | yes | Explicit error |
(?Pregex) (Python named capture group) | no | yes | Not detected (yet) |
(?P=name) (Python named backreference) | no | yes | Not detected (yet) |
(?regex) (JS named capture group) | new in ES2018 | no | Error from Python, not translated (yet) |
$ (JS named backreference) | new in ES2018 | no | Error from Python, not translated (yet) |
(?i) (case insensitive) | /i only | yes | Explicit error, compile with flags=re.IGNORECASE instead |
(?m) ( ^ and $ match at line breaks) | /m only | yes | Explicit error, compile with flags=re.MULTILINE instead |
(?s) (dot matches newlines) | no | yes | Explicit error, compile with flags=re.DOTALL instead |
(?x) (free-spacing mode) | no | yes | Explicit error, there is no corresponding mode in Javascript |
Backreferences non-existent groups are an error | no | yes | Follows Python behaviour |
Backreferences to failed groups also fail | no | yes | Follows Python behaviour |
Nested references \1 through \9 | yes | no | Follows Python behaviour |
Note that in many cases Python-only regex features would be treated as part of an ordinary pattern by JS regex engines. Currently we raise an explicit error on such inputs, but may translate them to have the JS behaviour in a future version.
Changelog
1.0.1 — 2019-10-17
- Allow use of native strings on Python 2. This is not actually valid according to the spec, but it’s only going to be around for a few months so whatever.
1.0.0 — 2019-10-04
- Now considered feature-complete and stable, as all constructs recommended for jsonschema patterns are supported and all Python-side incompatibilities are detected.
- Compiled patterns are now cached on Python 3, exactly as for re.compile
0.4.0 — 2019-10-03
0.3.0 — 2019-09-30
- Fixed handling of non-trailing $ , e.g. in «^abc$|^def$» both are converted
- Added explicit errors for re.LOCALE and re.VERBOSE flags, which have no JS equivalent
- Added explicit checks and errors for use of Python-only regex features
0.2.0 — 2019-09-28
Convert JS-only syntax to Python equivalent wherever possible.
0.1.0 — 2019-09-28
Initial release, with project setup and a very basic implementation.