Random split list python

Pythonic split list into n random chunks of roughly equal size

As part of my implementation of cross-validation, I find myself needing to split a list into chunks of roughly equal size.

import random def chunk(xs, n): ys = list(xs) random.shuffle(ys) ylen = len(ys) size = int(ylen / n) chunks = [ys[0+size*i : size*(i+1)] for i in xrange(n)] leftover = ylen - size*n edge = size*n for i in xrange(leftover): chunks[i%n].append(ys[edge+i]) return chunks 
>>> chunk(range(10), 3) [[4, 1, 2, 7], [5, 3, 6], [9, 8, 0]] 

But it seems rather long and boring. Is there a library function that could perform this operation? Are there pythonic improvements that can be made to my code?

3 Answers 3

import random def chunk(xs, n): ys = list(xs) 

Copies of lists are usually taken using xs[:]

 random.shuffle(ys) ylen = len(ys) 

I don’t think storing the length in a variable actually helps your code much

Use size = ylen // n // is the integer division operator

 chunks = [ys[0+size*i : size*(i+1)] for i in xrange(n)] 

Actually, you can find size and leftover using size, leftover = divmod(ylen, n)

 edge = size*n for i in xrange(leftover): chunks[i%n].append(ys[edge+i]) 

You can’t have len(leftovers) >= n . So you can do:

 for chunk, value in zip(chunks, leftover): chunk.append(value) return chunks 

Some more improvement could be had if you used numpy. If this is part of a number crunching code you should look into it.

Is there a library function that could perform this operation?

Are there pythonic improvements that can be made to my code?

Sorry it seems boring, but there’s not much better you can do.

The biggest change might be to make this into a generator function, which may be a tiny bit neater.

def chunk(xs, n): ys = list(xs) random.shuffle(ys) size = len(ys) // n leftovers= ys[size*n:] for c in xrange(n): if leftovers: extra= [ leftovers.pop() ] else: extra= [] yield ys[c*size:(c+1)*size] + extra 

The use case changes, slightly, depending on what you’re doing

chunk_list= list( chunk(range(10),3) ) 

The if statement can be removed, also, since it’s really two generators. But that’s being really fussy about performance.

def chunk(xs, n): ys = list(xs) random.shuffle(ys) size = len(ys) // n leftovers= ys[size*n:] for c, xtra in enumerate(leftovers): yield ys[c*size:(c+1)*size] + [ xtra ] for c in xrange(c+1,n): yield ys[c*size:(c+1)*size] 

Источник

Best way to split a list into randomly sized chunks?

This results in a (ignoring the last chunk) uniform distribution of chunk sizes between min_chunk and max_chunk , inclusively.

roippi 24915

Similar question

import random old_list = [5000, 5000, 5000, 5000, 5000, 5000] new_list = [] def random_list(old, new): temp = [] for each_item in old: temp.append(each_item) chance = random.randint(0,1) if chance < 1: new.append(temp) temp = [] return new 
[[5000, 5000, 5000, 5000], [5000, 5000]] [[5000, 5000, 5000, 5000], [5000], [5000]] [[5000], [5000], [5000, 5000], [5000, 5000]] 

kylieCatt 10314

Small variation on roippi's answer:

In [1]: import itertools In [2]: import random In [3]: def random_chunk(li, min_chunk=1, max_chunk=3): . it = iter(li) . return list( . itertools.takewhile( . lambda item: item, . (list(itertools.islice(it, random.randint(min_chunk, max_chunk))) . for _ in itertools.repeat(None)))) . In [4]: random_chunk(range(10), 2, 4) Out[4]: [[0, 1], [2, 3, 4], [5, 6, 7], [8, 9]] In [5]: random_chunk(range(10), 2, 4) Out[5]: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9]] In [6]: random_chunk(range(10), 2, 4) Out[6]: [[0, 1, 2, 3], [4, 5, 6], [7, 8, 9]] In [7]: random_chunk(range(10), 2, 2) Out[7]: [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]] In [8]: random_chunk(range(10), 1, 2) Out[8]: [[0, 1], [2, 3], [4], [5], [6], [7, 8], [9]] In [9]: random_chunk(range(10), 1, 2) Out[9]: [[0, 1], [2, 3], [4], [5], [6], [7], [8], [9]] In [10]: random_chunk(range(10), 1, 20) Out[10]: [[0], [1, 2, 3], [4, 5, 6, 7, 8, 9]] In [11]: random_chunk(range(10), 1, 20) Out[11]: [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]] In [12]: random_chunk(range(10), 1, 20) Out[12]: [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]] In [13]: random_chunk(range(10), 1, 20) Out[13]: [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]] In [14]: random_chunk(range(10), 1, 20) Out[14]: [[0], [1, 2, 3, 4, 5, 6, 7, 8], [9]] 
from random import randint def random_list_split(data): split_list = [] L = len(data) i = 0 while i < L: r = randint(1,L-i) split_list.append(data[i:i+r]) i = i + r return split_list 
>>> random_list_split(test) [[5000, 5000, 5000, 5000, 5000, 5000], [5000], [5000], [5000]] >>> random_list_split(test) [[5000, 5000, 5000, 5000], [5000, 5000], [5000, 5000], [5000]] >>> random_list_split(test) [[5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000], [5000]] >>> random_list_split(test) [[5000, 5000], [5000, 5000, 5000, 5000], [5000], [5000], [5000]] >>> random_list_split(test) [[5000, 5000, 5000, 5000, 5000, 5000], [5000], [5000], [5000]] >>> random_list_split(test) [[5000, 5000, 5000, 5000, 5000, 5000], [5000], [5000], [5000]] >>> random_list_split(test) [[5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000]] 

You can simply iterate through the list ( X ) and with fixed probability ( p ) put element in the "last" sublist and with 1-p to the new one

import random sublists = [] current = [] for x in X: if len(current)>0 and random.random() >= p: sublists.append(current) current = [] current.append(x) sublists.append(current) 

lejlot 62877

def randsplit(lst): out = [[]] for item in lst: out[-1].append(item) if random.choice((True, False)): out.append([]) return [l for l in out if len(l)] 

This method neither mutates lst nor returns any empty lists. A sample:

>>> l = [5000, 5000, 5000, 5000, 5000, 5000] >>> randsplit(l) [[5000, 5000], [5000, 5000], [5000, 5000]] >>> randsplit(l) [[5000, 5000, 5000], [5000, 5000], [5000]] >>> randsplit(l) [[5000], [5000], [5000, 5000], [5000], [5000]] 

jonrsharpe 109778

This is my approach to it: All resultant lists will have at least one element, but it may return a list with all numbers.

import random def randomSublists(someList): resultList = [] #result container index = 0 #start at the start of the list length = len(someList) #and cache the length for performance on large lists while (index < length): randomNumber = random.randint(1, length-index+1) #get a number between 1 and the remaining choices resultList.append(someList[index:index+randomNumber]) #append a list starting at index with randomNumber length to it index = index + randomNumber #increment index by amount of list used return resultList #return the list of randomized sublists 

Testing on the Python console:

>>> randomSublist([1,2,3,4,5]) [[1], [2, 3, 4, 5]] >>> randomSublist([1,2,3,4,5]) [[1], [2, 3], [4], [5]] >>> randomSublist([1,2,3,4,5]) [[1, 2, 3, 4, 5]] >>> randomSublist([1,2,3,4,5]) [[1, 2], [3], [4, 5]] >>> randomSublist([1,2,3,4,5]) [[1, 2, 3, 4, 5]] >>> randomSublist([1,2,3,4,5]) [[1, 2, 3, 4], [5]] >>> randomSublist([1,2,3,4,5]) [[1], [2, 3, 4], [5]] >>> randomSublist([1,2,3,4,5]) [[1], [2, 3], [4], [5]] >>> randomSublist([1,2,3,4,5]) [[1], [2], [3, 4, 5]] >>> randomSublist([1,2,3,4,5]) [[1, 2, 3, 4, 5]] >>> randomSublist([1,2,3,4,5]) [[1, 2, 3], [4, 5]] >>> randomSublist([1,2,3,4,5]) [[1, 2, 3, 4], [5]] 
  • Python A way to split a file into sections every time a newline appears in python and manipulate the sections
  • How to sort n dimensional list into one list with efficient way
  • best way to pair list of titles with a separate list of their corresponding links? (bs4)
  • What is best way to convert time series data (parquet format) into sequences using petastorm?
  • split a string of numbers into a list
  • What is the best way to efficiently compute the teager energy kurtosis using list comprehension?
  • Best and quickest way to get top N elements from a huge list in python
  • Best way to have list default arguments
  • Best way to unpack a list of lists with possibly nested list to a flat list
  • Split user input string into a list with every character
  • What is best way to serialize a Django DateTime object into json and then instantiate a JS Date object and back?
  • Best way to update a large number of dynamodb items from a list
  • How to split a nested list into a smaller nested list
  • What is the best practice way to create a list of 1 element type
  • Is there any efficient way to chunk a RDD which is having a big list into several lists without performing collection
  • split nested list into sublists by int values
  • Best way to handle boundary check for 2D list in python?
  • I Want to split list into two lists right_order[] and reverse_order[], while comparing items in list
  • Python: split list of values into two list of values. Sums of list should be as equal as possible
  • Converting a list of timestamp into readable format (requiring a quick way)
  • Python Twisted best way to signal events to a proxy
  • Scrapy - best way to export/store items in REST API
  • Correct way of writing down the attribute list of tuples of floats?
  • Efficient way to append a generated object within an existing list on a JSON file?
  • How to take items in list and reform them into json
  • Error while getting input added into list
  • Divide string data into list of lists by finding /r/n substring
  • More efficient way to write this PHP code, and how to instantiate PHP array similar to Python list
  • Python 3 - Randomly printing items from list of dictionaries without repeat
  • Best way to parse incoming byte stream?

More Query from same tag

  • Distinguish efficiently between different possible combinations in a tuple
  • Python - TypeError: can't convert complex to float
  • The 'google-api-python-client' distribution was not found on running EXE compiled by pyinstaller | frozen importlib._bootstrap
  • Missing State in Choropleth Map?
  • How to decode the file uploaded in postman using python starlette
  • Using python to analyse bigrams in a string of text
  • EasyOCR - Table extraction
  • How do I subtract two images from each other using python and opencv?
  • Add Columns in PySpark and Add Columns containing NULLS without casting all NULLS as 0
  • Getting all tweets from certain user with tweepy
  • Need to extract data from html tables
  • Do something while user input not received
  • Docker container exibits different behavour when run automatically
  • IPython Notebook: %run magic on non-python file types
  • copy of a n-tree
  • Launch modules as subprocesses in the background, and detach
  • Python pattern matching issue
  • Getting a 404 error on a page that was accessible
  • looping in beautiful soup / no errors
  • Get the time a file was last editted in python
  • Assigning attributes to specific function instance
  • Cannot Convert String To Float
  • JAXB equivalent for generating c++ classes from xsd?
  • Max distance between polygons in a polygon set
  • 'Int object is not iterable error' in my recursive code, not sure where I went wrong
  • How to use webbrowser module open a web site but at background in python 3?
  • How to integrate 'MENU' code for my game with the actual gaming code?
  • How to pass python data back to the calling script about a browser window left open when the called script closes
  • Task to convert natural language query to SQL query
  • I keep getting the error: AttributeError: 'NoneType' object has no attribute 'strip'
  • Script fails to generate results
  • How to get command prompt output? Why can't I get it in this code
  • Using range of dates to iterate through series of date stamped XHR requests
  • How to not write to file while reading and vise-versa
  • Is there an way to run a While loop 60 times a second?

Источник

Читайте также:  Python set exit code
Оцените статью