Anaconda python open source

Anaconda’s Q4 2022 Open-Source Roundup

Substantial and impactful open-source innovation is at the heart of Anaconda’s efforts to provide tooling for developing and deploying secure Python solutions, faster. With the goal of capturing and communicating our teams’ many ongoing contributions to a wide variety of open-source projects, we are now providing regular roundups of related news items on our blog.

As usual, Anaconda’s open-source software (OSS) teams have been very active over the last few months! In this first edition of our new quarterly OSS roundup, I’ll highlight some of our biggest open-source contributions plus a couple of smaller but still very interesting efforts. I’ll also touch on what’s coming in the next few months.

Note: Please see this recent PyScript post for some updates on that particular project, as they will not be covered again here.

Highlights by Dev Group

Anaconda has many different teams working on open source, and each performs a wide variety of tasks. Below I will cover some of our core efforts and recent milestones. Please note that the split into bullets is merely for readability; in practice, many of us work across these divisions.

Dask and Data Access

  • The Awkward Array project provides vectorized (fast!) data processing for the data that just doesn’t fit into normal arrays or tables—nested and variable-length, “JSON-like” data—all with familiar NumPy syntax. Since this is squarely aimed at big data, it makes sense to want to do this processing in parallel and distributed on a cluster. Dask-awkward does exactly this, and will be released roughly concurrently with both this blog post and V2 of Awkward itself. The library brings the full Awkward API to big distributed data and is ready for general use.
  • Out of the same effort, we created awkward-pandas, where you have a mix of nested, variable-length data and ordinary flat columns. We bring the power and speed of Awkward into the pandas API with our own new extension type and convenient methods for converting to and from Python/pandas native types. This includes integration into the JSON and Parquet load/write mechanisms. This was released as alpha at PyData Global, and work on dask-awkward-pandas is ongoing.
  • The Intake library for data access and cataloging is mature and stable. Recently we’ve invested new effort into rejuvenating and pushing the project forward. Of particular note, the Intake graphical user interface (GUI) will soon offer new functionality—integration with hvPlot’s explorer and more interactivity when it comes to editing and building your data sources, plots, and catalog files dynamically.
  • fastparquet may be largely in maintenance, but we do still improve it. 2022.11.0 brings speed for nullable types, schema evolution, and in-place metadata updates.
  • Following extensive involvement with python-graphblas, which brings optimized graph processing to Python, we’ve concentrated on rounding off the library and providing extensive documentation to the community.
Читайте также:  Введение питон в программирование

Jupyter

  • While some users have not yet transitioned to JupyterLab (or the upcoming Notebook V7), Anaconda has stepped in to revive maintenance of the “classic” Jupyter Notebook codebase. This has enabled several new security and bug fix releases of Jupyter Notebook 6.x.
  • Please see this blog post on the release of Jupyter 6.5 as a transitional point on the way to Notebook V7, but with many updates and stability fixes.
  • In order to enable longer-term support of the classic Notebook, the frontend code has been moved into the nbclassic package, which can coexist in environments along with JupyterLab and the future Notebook V7.
  • The team has been engaging the wider Jupyter community to understand their needs, and will be looking in 2023 at documentation and features to ease the transition of extensions and extension authors to the new JupyterLab-based system.
  • Finally, as an interesting point for technical folks, we converted all of the Selenium-based tests in nbclassic to Playwright, which was a major development to pull off, but has increased the reliability of the test suite significantly.

HoloViz

  • Bokeh 3 was a major release of the low-level interactive graphics library on which the whole HoloViz stack relies. Of particular note, the layout system was rewritten to reuse more modern browser primitives rather than handling sizing and placing internally, which should result in better interoperability with other graphical components in a page/app.
  • The Bokeh changes allowed for (and required) work across the whole related stack, which resulted in the following releases over the last few weeks:
    • Panel’s interactive widgetset
    • HoloViews’ interactive plots generation engine
    • GeoViews’ mapping and projection extension
    • Datashader’s high-performance server-side data point aggregator
    • Lumen’s interactive dashboard builder
    • I’d particularly like to highlight Panel 0.14, which fully integrates with PyScript so you can run interactive Python data visualization applications without a server and provides the explorer (the same as is used in conjunction with the aforementioned Intake GUI) for interactively building views of dataframe data.

    Conda

    • Conda has become much more open and community driven this year. Note, for example, the enhancement proposal process, and continued conversation and collaboration with mamba.
    • Conda moved to calendar versioning and a regular release cycle starting with version 22.x.
    • Plugins are now supported by conda’s architecture, so developers can create and offer new functionality without adding code to the main repo.
    • The main artifact used by conda has moved to V2 “.conda” files, with greatly improved download and unpacking speed.

    BeeWare

    • BeeWare is a collection of tools for writing Python applications that can run with native look and feel on mobile, desktop, and web platforms. The BeeWare project maintains its own blog with monthly updates, roadmaps, and other news. A very quick glance makes it clear there’s been plenty of recent activity, particularly around Briefcase, the build/deploy system.
    • Chaquopy, a separate tool for deploying Python on Android, has become open source and folded into Briefcase.
    • Binaries packages have arrived for Android and iOS, so now you can include popular libraries like NumPy and Matplotlib in your mobile Python app.
    • Build systems for Python 3.11 were ready as soon as it was available.
    • A complete rewrite of the testing infrastructure is now in progress. Briefcase now has the ability to run a test suite inside the app simulator environment, and this capability is being used to implement a comprehensive, cross-platform test suite for the Toga GUI toolkit.

    Numba

    • Numba is a just-in-time (JIT) compiler for Python code optimized for running numerical algorithms on CPU and GPU backends. Much work was done this quarter to support Python 3.11 and upgrade to LLVM 14. These tasks are ongoing, but should be landing in the development branch soon.
    • In preparation for a big push in 2023 to modularize Numba for easier reuse in other projects that need compiler components, we’ve been moving forward on several proof-of-concept efforts. We will see these components folded into Numba or potentially new projects over the coming year.
    • The Numba team has been doing a major rewrite of the bytecode analysis frontend to better handle the rapidly evolving bytecode changes that come with each minor release of Python. This work should help us to roll out Numba updates for new Python releases faster, and also enable other compiler enhancements in the future. Look for this to land in Numba sometime in Q1 2023.
    • We have also been hard at work continuing to improve internal usage of Numba’s extension APIs, which has enabled improvements to the compute unified device architecture (CUDA) target that both increase functionality and reduce the size of the code. This work should also allow for more consistent math behavior in the future.

    Other

    spatialpandas to Awkward

    spatialpandas is a library for working with geometric objects as one column of a pandas dataframe, with other normal columns and a view for aggregations and visualization. Following our work on awkward-array for Dask and for pandas, we realized that spatialpandas could make use of these tools. In particular, polygons and lines can be represented as variable-length arrays of points, with each point made of two or more numbers. This is exactly the kind of data structure that Awkward deals with. Preliminary experiments show that we can swap out a lot of complex ac-hoc legacy code from spatialpandas in favor of calling well-tested modern code in Awkward (via awkward-pandas), and get a decent speed boost from the change too. This is a very nice example of using our own up-and-coming tools for a variety of use cases, and we will be developing this functionality further in the coming months. Watch this space!

    Kerchunk

    Kerchunk is a library for making virtual datasets out of many other datasets of several possible formats, and providing the benefits of cloud-native data access without copying or reformatting the original files. It’s been around for a little while, but this quarter it received renewed attention and effort, so we were able to surface additional features such as:

    • Consolidating many nearby reads within a target file to reduce the number of calls
    • Scanning files with a tree-reduction scheme using Dask
    • Coordinate creation utility for geotiff
    • Automatic extraction of smaller chunks when the target is not compressed

    With this new push, expect a lot more news regarding this project over the next six months.

    About the Author

    Martin Durant is a former astrophysicist with several years of scientific research experience. He has also worked in medical imaging, building AI/ML pipelines and a research platform. After a brief stint as a data scientist in ad-tech, Martin moved to Anaconda to work on PyData education. He now leads a number of open-source PyData projects, focussing on data access, formats, and parallel processing.

    Источник

    Join 35 million users building the future with Anaconda

    Anaconda Assistant is now in Private Preview.
    Let AI power your analysis, visualizations, and data processing with the Anaconda Assistant.

    • Automatically generate code snippets, plots, and graphs
    • Quickly debug errors and fix code issues
    • Get code optimization recommendations

    Code in the cloud. Nothing to install or configure. Work from any computer using our cloud-hosted notebook service.

    • Publish data applications and share results
      (Learn more)
    • Sample notebooks to get you started
    • Catalog of real-world data sets

    Code with the world’s most trusted Python distribution. From AI solutions to interactive visualizations, Anaconda is the world’s preferred distribution for numerical and scientific computing.

    • More than 6,000 Python libraries
    • Over 2,000 interoperable R packages
    • Built from source + tamper free

    Learn data science and Python while you build, with Anaconda Learning.

    • Instructor-led courses
    • On demand and live
    • Essential data science skills
    • Earn completion certificates

    Easily build and share web applications, with zero infrastructure

    • Create next-generation applications
    • Share in one click
    • Build new or fork community projects

    Источник

    Free Download

    Everything you need to get started in data science on your workstation.

    • Free distribution install
    • Thousands of the most fundamental DS, AI, and ML packages
    • Manage packages and environments from desktop application
    • Deploy across hardware and software platforms

    Open Source

    Access the open-source software you need for projects in any field, from data visualization to robotics.

    User-friendly

    With our intuitive platform, you can easily search and install packages and create, load, and switch between environments.

    Trusted

    Our securely hosted packages and artifacts are methodically tested and regularly updated.

    Grid of logos

    Anaconda Repository

    Our repository features over 8,000 open-source data science and machine learning packages, Anaconda-built and compiled for all major operating systems and architectures.

    Conda

    Conda is an open-source package and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs, and updates packages and their dependencies. It also easily creates, saves, loads, and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.

    Terminal window screenshot

    Navigator screenshot

    Anaconda Navigator

    Our desktop application lets you easily manage integrated applications, packages, and environments without using the command line.

    Источник

Оцените статью