How to make List in python dataclass that can accept multiple different types?
And if you are on Python 3.10, they’ve added a convenient shorthand notation:
items: list[DefaultItem | SomeSpecificItem | SomeOtherItem] = None
Also just as a note: If items is allowed to be None , you should mark the type as Optional .
Also, a note that in Python 3.10, you can also pass the kw_only parameter to the @dataclass decorator to work around the issue which I suspect you’re having, wherein all fields in a subclass are required to have a default value when there is at least one field with a default value in the superclass, Mixin in this case.
I added an example below to illustrate this a little better:
from dataclasses import dataclass @dataclass class Mixin: string: str integer: int = 222 @dataclass(kw_only=True) class User(Mixin): id: int items: list['A | B | C'] class A: . class B: . class C: . u = User(string='abc', integer=123, items=[]) print(u)
Note that I’ve also wrapped the Union arguments in a string, so that the expression is forward-declared (i.e. not evaluated yet), since the classes in the Union arguments are defined a bit later.
This code works in 3.10 because the kw_only param is enabled, so now only keyword arguments are accepted to the constructor. This allows you to work around that issue as mentioned, where you would otherwise need to define a default value for all fields in a subclass when there’s at least one default field in a parent class.
In earlier Python versions than 3.10, missing the kw_only argument, you’d expect to run into a TypeError as below:
TypeError: non-default argument 'id' follows default argument
The workaround for this in a pre-3.10 scenario is exactly how you had it: define a default value for all fields in the User class as below.
from __future__ import annotations from dataclasses import dataclass, field @dataclass class Mixin: string: str integer: int = 222 @dataclass class User(Mixin): id: int = None items: list[A | B | C] = field(default_factory=list) class A: . class B: . class C: . u = User('abc', 123, 321) print(u)
How does python have different data types in an array?
Python can have a list with different data types in it i.e. [1,»two»,3]. Python was implemented in c, so how would I create an array with different data types in it using c?
By implementing and then using Python. Seriously, don’t try this. The entire reason you are using a language like C, where you have to write out the type of everything, is so that the type of everything can be determined at compile-time. This is impossible for the mixed-type list except in thoroughly useless cases (where you really want a struct ), so you end up having to write types for everything and write code that figures out what the types of things are inside the list at runtime and get undefined behaviour if you’re wrong instead of getting an exception. The worst of every world.
@user1163114: Please explain why you want to do this. There’s probably a more C-ish way of accomplishing your goals.
4 Answers 4
So, I have no idea how it is implemented in Python, but in C there are ways to operate on generic data. in its most simple form:
Now you have an array full of void* . Each void* can point to anything. You would want some extra information as well to tell you the size of the thing it points to.
You could represent the actual type if needed via a variety of methods. You can use unions to hold one of N types in the same variable. Anyway, the point is that there are various ways to do operate on generic data in C.
What if your array consisted of C structs of the form:
Now you have an array that can contain arbitrary types of data, as long as you can represent them as pointers-to-char.
This isn’t, of course, how Python does it; the point is that there doesn’t have to be any connection between how your application handles data and how your implementation language handles data.
how would I create an array with different data types in it using c?
You can’t; C is a statically-typed language.
You can, however, use things like unions:
typedef union < int i; float f; >Foo; Foo array[3]; array[0].i = 3; array[1].f = 4.2; .
You can also use void * to point at any objects you like, but this requires careful memory management.
In Python, there are no “raw” values; everything is an object and knows its type. Even integers are objects (try printing (1).__class__ or 1 .__class__ ). Everything you do in C with Python objects, you do through a PyObject * (a pointer to the object).¹
A Python list is a dynamic (i.e. resizable) array of PyObject * . Since every object knows its type, the list doesn’t have to be declared as having members of a specific type.
¹ Also note: Python does not have “variables” in the usual sense (C, BASIC, Pascal etc), where a typed variable contains a value; it has namespaces and names (actually, dictionaries mapping strings to objects; a namespace is a dictionary, its keys are the names, its values are the pointers to the objects pointed to by each name).
How to sort a list of different types?
I need to sort a list using python 3. There might be strings , integers , floats , or tuples , etc. I am currently trying to make correct use of the sort function using the key parameter like this:
data.sort(key=gen_key) . def gen_key(self, value): if is_number(value): return str(value) if isinstance(value, str): return value return '___' + type(value).__name__
But the problem is that numbers will now be sorted naturally. While I want to order the numbers and floats still like numbers and floats instead of treating them as strings. The behavior is caused by the return str(value) part. But i cannot return a different type than a string, as that will raise an exception, as of python 3 strings wont be sorted with numbers like they did in python 2. The exception is the following
And how would you want ‘A’ to sort vs 13 ? You’ll need to come up with a well-defined sort order. Once you’ve done that, you’ll pretty much be done already.
2 Answers 2
The trick is to make your key function return a tuple with a guaranteed comparable type in the first index, and the disparate types in subsequent indices.
While not 100% identical to what Python 2 does, for your specific case of «numbers to the front, everything else compared by typename» you can do this with a reasonably efficient key function:
>>> from numbers import Number >>> seq = ['Z', 3, 'Y', 1, 'X', 2.5, False, (1, 2), [2, 3], None] >>> sorted(seq, key=lambda x: (x is not None, '' if isinstance(x, Number) else type(x).__name__, x)) [None, False, 1, 2.5, 3, [2, 3], 'X', 'Y', 'Z', (1, 2)]
The key function here makes the first element of the key a simple bool , forcing None to sort before everything else (Py2 did the same thing), then sorting all numeric types first by using the empty string for the second part of the key, where everything else uses their type name (also like Py2). Once you’ve gotten past those first two indices, what’s left is the same type, and should compare just fine.
The main flaw here is that comparable non-numeric types like set and frozenset won’t compare to one another, they’ll be sorted by typename only (a custom key class using exceptions could handle this).
It also won’t handle the recursive case; if the sequence contains [2, 3] and [‘a’, ‘b’] , it will have a TypeError comparing 2 with ‘a’ , but nothing short of a ridiculously involved key class could handle that.
If that’s not an issue, this is cheap to run and relatively simple.
Unlike solutions involving custom classes with __lt__ defined to perform the comparison, this approach has the advantage of generating built-in keys, which compare efficiently with minimal execution of Python-level code during the sort.
# Multiply out the sequence so log n factor in n log n work counts for something >>> seq = ['Z', 3, 'Y', 1, 'X', 2.5, False, (1, 2), [2, 3], None] * 100 # Verify equivalence >>> sorted(seq, key=Py2Key) == sorted(seq, key=lambda x: (x is not None, '' if isinstance(x, Number) else type(x).__name__, x)) True # Timings in seconds for the fastest time (of 3 trials) to run the sort 1000 times: >>> import timeit # Py2Key class >>> min(timeit.repeat('sorted(seq, key=Py2Key)', 'from __main__ import seq, Py2Key', number=1000)) 5.251885865057375 >>> min(timeit.repeat('sorted(seq, key=lambda x: (x is not None, "" if isinstance(x, Number) else type(x).__name__, x))', 'from __main__ import seq, Number', number=1000)) 1.9556877178131344
Basically, avoiding the overhead of dynamic Python-level __lt__ is reducing runtime by just over 60%. It doesn’t seem to be an algorithmic improvement (a seq 100 times longer has the same runtime ratio), just a reduction in fixed overhead, but it’s a non-trivial reduction.
Assign Different Types to Python List in Advance
My apologies if this may have been asked before but I’m very new to Python. I have a file containing data records similar to the following; K; 0; 710; 85; 2; 2013:12:04:13:11:36.291; 0.0000; 1;1009.3000; 0;
K; 0; 710; 85; 3; 2013:12:04:13:11:36.291; 0.0000; 1;1009.3000; 0;
K; 17; 718; 86; 1; 2013:12:04:13:11:36.198; 995.6880; 4; 0.0000; 0; 0.0000; 280; 0.0000; 576; 0.0000; 904;
K; 17; 718; 86; 2; 2013:12:04:13:11:36.198; 0.0000; 4;1484.0000; 0;1484.0000; 280;1484.0000; 576;1481.6000; 904; The records are varying length but I am only interested in the first eight items in each record. The items in each record are delimited with the «;» character and varying numbers of space characters. As I read the file, I would like to assign each line to a list but I would also like to define the items in the list to have the correct type, e.g. str, int, int, int, int, datetime, float, int etc. At present I am using the following code;
def file_extract(pathfile): file = open(pathfile) contents = file.read() # remove spaces and split data based on ';' and \n data_list = [lines.replace(" ","").split(";") for lines in contents.split("\n")] for line in data_list: if line[0] == "K": listraw=line[:9] listraw[1]=int(line[1]) listraw[2]=int(line[2]) # continue setting types in the listraw[] etc. etc.
Unfortunately, as I read the each record from the file contents into a list, all of the items in the list are automatically assigned to string values similar to the following;
‘K’ ‘0’ ‘710’ ’85’ ‘2’ ‘2013:12:04:13:11:36.291’.
I then have to go through each individual item of the list to set the type as I wish. Is there a more elegant way of setting the individual types in the list?
No, Python cannot tell from your data what it is «supposed» to be. You will just have to cast it to whatever type is appropriate.
No. they stay whatever type they were when they came in. They are strings because you read the from a file.
1 Answer 1
You can put the datatypes in a list and then match them with the fields using zip . Something like this:
import datetime # write a parser for the timepoints def dateparser(string): # guessed the dateformat return datetime.datetime.strptime(string, '%Y:%m:%d:%H:%M:%S.%f') # From your code `if line[0] == 'K'` I assume that 'K' is a key for the # datatypes in the corresponding row. # For every rowtype you define the datatypes here, where datatype # is equivalent to a parser. Just make sure it accepts a string and returns the # type you need. # I guessed the types here so it works with your example. parsers = # the example content contents = """K; 0; 710; 85; 2; 2013:12:04:13:11:36.291; 0.0000; 1;1009.3000; 0; K; 0; 710; 85; 3; 2013:12:04:13:11:36.291; 0.0000; 1;1009.3000; 0; K; 17; 718; 86; 1; 2013:12:04:13:11:36.198; 995.6880; 4; 0.0000; 0; 0.0000; 280; 0.0000; 576; 0.0000; 904; K; 17; 718; 86; 2; 2013:12:04:13:11:36.198; 0.0000; 4;1484.0000; 0;1484.0000; 280;1484.0000; 576;1481.6000; 904; """ data = [] # the right way for doing this with a file would be: # with open(filepath, 'r') as f: # for line in f: for line in contents.split('\n'): # skip empty lines if not line.strip(): continue # first split then strip, feels safer this way. fields = [f.strip() for f in line.split(';')] # select the parserlist from our dict parser_list = parsers[fields[0]] # Now match the fields with their parsers, it will automatically stop # when there is no parser left. This means if you have 8 parsers only 8 # fields will be evaluated and the rest is ignored. # Comes in handy when the lengths of your row types differ. # However it this also goes the other way around. If there # are less fields than parsers, the last parsers will be # ignored. If you don't want this to happen you have to # make sure that len(fields) >= len(parser_list) data.append([parser(field) for parser, field in zip(parser_list, fields)]) for row in data: print(row)
['K', 0, 710, 85, 2, datetime.datetime(2013, 12, 4, 13, 11, 36, 291000), 0.0, 1, 1009.3] ['K', 0, 710, 85, 3, datetime.datetime(2013, 12, 4, 13, 11, 36, 291000), 0.0, 1, 1009.3] ['K', 17, 718, 86, 1, datetime.datetime(2013, 12, 4, 13, 11, 36, 198000), 995.688, 4, 0.0] ['K', 17, 718, 86, 2, datetime.datetime(2013, 12, 4, 13, 11, 36, 198000), 0.0, 4, 1484.0]