Перемешать строки датафрейма питон

Как случайно перемешать строки или добавить пустую строку в pandas

Как добавить пустую строку к DataFrame в [pandas]

df.append(pandas.Series(), ignore_index=True) 

Если нужно добавить в начало, то можно так:

df1 = pd.DataFrame([[np.nan] * len(df.columns)], columns=df.columns) df = df1.append(df, ignore_index=True) 
df A B C D E 0 1 2 3 4 5 1 4 5 6 7 8 df.loc[len(df)] = 0 df A B C D E 0 1 2 3 4 5 1 4 5 6 7 8 2 0 0 0 0 0 df = df.shift() df.loc[0] = 0 df A B C D E 0 0.0 0.0 0.0 0.0 0.0 1 1.0 2.0 3.0 4.0 5.0 2 4.0 5.0 6.0 7.0 8.0 
  • [pandas-rolling-window]
  • [2021-12-26-daily-note] найти разницу между датафреймами, измененеие порядка колонок
  • [2022-01-04-daily-note] ошибка If using all scalar values, you must pass an index, конвертация датафрейма в датафрейм в одну строку и о том как хранить и извлекать списки в датафрейме

Источник

Pandas Shuffle DataFrame Rows Examples

By using pandas.DataFrame.sample() method you can shuffle the DataFrame rows randomly, if you are using the NumPy module you can use the permutation() method to change the order of the rows also called the shuffle. Python also has other packages like sklearn that has a method shuffle() to shuffle the order of rows in DataFrame

Читайте также:  Php insert into header

1. Create a DataFrame with a Dictionary of Lists

Let’s create a Pandas DataFrame with a dictionary of lists, pandas DataFrame columns names Courses , Fee , Duration , Discount .

 # Create a DataFrame with a Dictionary of Lists import pandas as pd technologies = < 'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"], 'Fee' :[20000,25000,26000,22000,24000,21000,22000], 'Duration':['30day','40days','35days','40days','60days','50days','55days'], 'Discount':[1000,2300,1500,1200,2500,2100,2000] >df = pd.DataFrame(technologies) print(df) 
 # Output: Courses Fee Duration Discount 0 Spark 20000 30day 1000 1 PySpark 25000 40days 2300 2 Hadoop 26000 35days 1500 3 Python 22000 40days 1200 4 pandas 24000 60days 2500 5 Oracle 21000 50days 2100 6 Java 22000 55days 2000 

2. Pandas Shuffle DataFrame Rows

Use pandas.DataFrame.sample(frac=1) method to shuffle the order of rows. The frac keyword argument specifies the fraction of rows to return in the random sample DataFrame. frac=None just returns 1 random record. frac=.5 returns random 50% of the rows.

Note that the sample() method by default returns a new DataFrame after shuffling.

 # Shuffle the DataFrame rows & return all rows df1 = df.sample(frac = 1) print(df1) 
 # Output: Courses Fee Duration Discount 0 Spark 20000 30day 1000 6 Java 22000 55days 2000 1 PySpark 25000 40days 2300 5 Oracle 21000 50days 2100 2 Hadoop 26000 35days 1500 3 Python 22000 40days 1200 4 pandas 24000 60days 2500 

If you wanted to get n random rows use df.sample(n=2) .

3. Pandas Shuffle Rows by Setting New Index

As you see above the Index also shuffled along with the rows. If you wanted a new Index starting from 0 by keeping the shuffled Index as-is use reset_index() .

 # Create a new Index starting from zero df1 = df.sample(frac = 1).reset_index() print(df1) 
 # Output: index Courses Fee Duration Discount 0 6 Java 22000 55days 2000 1 2 Hadoop 26000 35days 1500 2 4 pandas 24000 60days 2500 3 3 Python 22000 40days 1200 4 5 Oracle 21000 50days 2100 5 0 Spark 20000 30day 1000 6 1 PySpark 25000 40days 2300 

In case if you do not want a shuffled Index then use .reset_index(drop=True)

 # Drop shuffle Index df1 = df.sample(frac = 1).reset_index(drop=True) print(df1) 

4. Using numpy.random.shuffle to Change Order of Rows

You can use numpy.random.shuffle() to change the order of the DataFrame rows. Make sure you import NumPy before using this method.

 # Using NumPy import numpy as np np.random.shuffle(DataFrame.values) 

5. Using permutation() From numpy to Get Random Sample

We can also use NumPy.random.permutation() method to shuffle to Pandas DataFrame rows. The shuffle indices are used to select rows using the .iloc[] method. You can shuffle the rows of a DataFrame by indexing with a shuffled index. For instance, df.iloc[np.random.permutation(df.index)].reset_index(drop=True) .

 # Using numpy permutation() method to shuffle DataFrame rows df1 = df.iloc[np.random.permutation(df.index)].reset_index(drop=True) print(df1) 
 # Output: Courses Fee Duration Discount 0 pandas 24000 60days 2500 1 Spark 20000 30day 1000 2 Java 22000 55days 2000 3 Oracle 21000 50days 2100 4 Python 22000 40days 1200 5 PySpark 25000 40days 2300 6 Hadoop 26000 35days 1500 

6. Using sklearn shuffle() to Reorder DataFrame Rows

You can also use sklearn.utils.shuffle() method to shuffle the pandas DataFrame rows. In order to use sklearn , you need to install it using PIP (Python Package Installer). Also, in order to use it in a program make sure you import it.

 # Using sklearn to shuffle rows from sklearn.utils import shuffle df = shuffle(df) 

7. Using DataFrame.apply() & numpy.random.permutation() to Shuffle

You can also use df.apply(np.random.permutation,axis=1) . Yields below output that shuffle the rows, dtype:object .

 # Using apply() method to shuffle the DataFrame rows import numpy as np df1 = df.apply(np.random.permutation, axis=1) print(df1) 
 # Output: 0 [30day, Spark, 1000, 20000] 1 [40days, PySpark, 25000, 2300] 2 [1500, Hadoop, 26000, 35days] 3 [40days, 1200, Python, 22000] 4 [60days, pandas, 2500, 24000] 5 [2100, 21000, 50days, Oracle] 6 [2000, Java, 22000, 55days] dtype: object 

8. Pandas DataFrame Shuffle/Permutating Rows Using Lambda Function

Use df.apply(lambda x: x.sample(frac=1).values to do sampling independently on each column. Use apply to iterate over each column and .value to get a NumPy array. frac=1 means all rows of a DataFrame.

 # Using lambda method to Shuffle/permutating DataFrame rows df2 = df.apply(lambda x: x.sample(frac=1).values) print(df2) 
 # Output: Courses Fee Duration Discount 0 Oracle 20000 40days 1000 1 Hadoop 21000 60days 2300 2 pandas 26000 40days 1500 3 PySpark 24000 30day 1200 4 Spark 22000 35days 2000 5 Java 22000 50days 2500 6 Python 25000 55days 2100 

9. Shuffle DataFrame Randomly by Rows and Columns

You can use df.sample(frac=1, axis=1).sample(frac=1).reset_index(drop=True) to shuffle rows and columns randomly. Your desired DataFrame looks completely randomized. I really don’t know the use case of this but would like to cover it as this is possible with sample() method.

 # Using sample() method to shuffle DataFrame rows and columns df2 = df.sample(frac=1, axis=1).sample(frac=1).reset_index(drop=True) print(df2) 
 # Output: Duration Fee Discount Courses 0 60days 24000 2500 pandas 1 55days 22000 2000 Java 2 40days 25000 2300 PySpark 3 40days 22000 1200 Python 4 35days 26000 1500 Hadoop 5 50days 21000 2100 Oracle 6 30day 20000 1000 Spark 

10. Complete Example For Shuffle DataFrame Rows

 import pandas as pd technologies = < 'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"], 'Fee' :[20000,25000,26000,22000,24000,21000,22000], 'Duration':['30day','40days','35days', '40days','60days','50days','55days'], 'Discount':[1000,2300,1500,1200,2500,2100,2000] >df = pd.DataFrame(technologies) print(df) # Shuffle the DataFrame rows & return all rows df1 = df.sample(frac = 1) print(df1) # Create a new Index starting from zero df1 = df.sample(frac = 1).reset_index() print(df1) # Using NumPy import numpy as np np.random.shuffle(DataFrame.values) # Using numpy permutation() method to shuffle DataFrame rows df1 = df.iloc[np.random.permutation(df.index)].reset_index(drop=True) print(df1) # Using sklearn to shuffle rows from sklearn.utils import shuffle df = shuffle(df) # Using apply() method to shuffle the DataFrame rows import numpy as np df1 = df.apply(np.random.permutation, axis=1) print(df1) # Using lambda method to Shuffle/permutating DataFrame rows df2 = df.apply(lambda x: x.sample(frac=1).values) print(df2) # Using sample() method to shuffle DataFrame rows and columns df2 = df.sample(frac=1, axis=1).sample(frac=1).reset_index(drop=True) print(df2) 

11. Conclusion

In this article, you have learned how to shuffle Pandas DataFrame rows using different approaches DataFrame.sample() , DataFrame.apply() , DataFrame.iloc[] , lambda function. Also, you have learned to shuffle Pandas DataFrame rows using NumPy.random.permutation() and sklearn.utils.shuffle() methods.

References

You may also like reading:

Источник

Перемешивание строк DataFrame в Python

Shuffling rows in a DataFrame.

Часто возникает ситуация, когда данные в DataFrame в Python упорядочены определенным образом, который может не подходить для конкретной задачи. В качестве примера можно привести DataFrame, содержащий данные о товарах разных категорий, где все товары одной категории идут подряд. В таком случае, может потребоваться перемешать строки DataFrame так, чтобы товары разных категорий были равномерно распределены по всему DataFrame.

Пример исходного DataFrame:

| | Product | Category |
|—|———|———-|
| 0 | Apple | Fruit |
| 1 | Banana | Fruit |
| 2 | Pear | Fruit |
| 3 | Tomato | Vegetable|
| 4 | Cucumber| Vegetable|
| 5 | Potato | Vegetable|

Для перемешивания строк DataFrame в Python можно воспользоваться функцией sample из библиотеки pandas . Эта функция возвращает случайную выборку из переданного ей DataFrame. Для получения перемешанного DataFrame достаточно передать в эту функцию исходный DataFrame и указать размер выборки равным количеству строк в исходном DataFrame.

import pandas as pd # Создание исходного DataFrame df = pd.DataFrame(< 'Product': ['Apple', 'Banana', 'Pear', 'Tomato', 'Cucumber', 'Potato'], 'Category': ['Fruit', 'Fruit', 'Fruit', 'Vegetable', 'Vegetable', 'Vegetable'] >) # Перемешивание строк DataFrame df = df.sample(frac=1) print(df)

На выходе будет получен DataFrame с перемешанными строками:

| | Product | Category |
|—|———|———-|
| 4 | Cucumber| Vegetable|
| 2 | Pear | Fruit |
| 1 | Banana | Fruit |
| 3 | Tomato | Vegetable|
| 0 | Apple | Fruit |
| 5 | Potato | Vegetable|

Важно отметить, что индексы строк после перемешивания сохраняются. Если требуется сбросить индексы и присвоить новые, это можно сделать с помощью функции reset_index :

df = df.sample(frac=1).reset_index(drop=True)

Таким образом, перемешивание строк DataFrame в Python — это простая и быстрая операция, которую можно выполнить с помощью встроенных функций библиотеки pandas.

Источник

Оцените статью