使用 Python 打乱 csv 文件的所有行

Question

I have an input csv file with data:我有一个包含数据的输入 csv 文件：

I want to generate a different csv file which will contain complete data rows from input file but rows should be shuffled.我想生成一个不同的 csv 文件，该文件将包含来自输入文件的完整数据行，但行应该被打乱。

like output file may contain values-像输出文件可能包含值-

b 14
a 15
c 20
d 45

I have tried this code:我试过这个代码：

import random
import sys
op=open('random.csv','w+')
ip=open(sys.argv[1],'r')
data=ip.read()
data1=str(random.choices(data))
op.write(data1)
op.close()

Answer 1

Another shot using pandas .另一个使用pandas镜头。 You can read your .csv file with:您可以使用以下命令读取 .csv 文件：

df = pd.read_csv('yourfile.csv', header=None)

and then using df.sample to shuffle your rows.然后使用df.sample来洗牌你的行。 This will return a random sample of your dataframe with rows shuffled.这将返回数据帧的随机样本，其中的行被打乱。 Using frac=1 you consider the whole set as sample:使用frac=1您将整个集合视为样本：

In [18]: df
Out[18]: 
   0   1
0  a  15
1  b  14
2  c  20
3  d  45

In [19]: ds = df.sample(frac=1)

In [20]: ds
Out[20]: 
   0   1
1  b  14
3  d  45
0  a  15
2  c  20

If you need to save out again the new shuffled file you can just:如果您需要再次保存新的洗牌文件，您可以：

ds.to_csv('newfile.csv')

Answer 2

You can use the shuffle function from Python random module.您可以使用 Python random模块中的shuffle函数。 Like this:像这样：

import random
fid = open("example.txt", "r")
li = fid.readlines()
fid.close()
print(li)

random.shuffle(li)
print(li)

fid = open("shuffled_example.txt", "w")
fid.writelines(li)
fid.close()

The print commands result in this:打印命令导致：

['b 14\n', 'a 15\n', 'c 20\n', 'd 45\n']
['d 45\n', 'a 15\n', 'b 14\n', 'c 20\n']

And the new file is this:新文件是这样的：

d 45
a 15
b 14
c 20

Just make sure you have a newline at the end of each of your original lines.只需确保在每个原始行的末尾都有一个换行符。

Answer 3

You can use shuf .您可以使用shuf 。

Once you installed shuf , run安装shuf ，运行

shuf -o shuffled-file.csv < file-to-shuffle.csv

Answer 4

There is a shuffle function in the random module. random 模块中有一个shuffle函数。 Also, you can you readlines() in order to have a list:此外，您可以readlines()以获得列表：

>>> ip=open('random.csv','r')
>>> data=ip.readlines()
>>> data
['a   15\n', 'b   14\n', 'c   20\n', 'd   45\n']
>>> from random import shuffle
>>> shuffle(data)
>>> data
['c   20\n', 'd   45\n', 'a   15\n', 'b   14\n']

If you have an header, just split the data, and shuffle the rows:如果您有标题，只需拆分数据并随机排列行：

>>> ip=open('random.csv','r')
>>> data=ip.readlines()
>>> header, rest=data[0], data[1:]
>>> header
'h1  h2\n'
>>> rest
['a   15\n', 'b   14\n', 'c   20\n', 'd   45\n']
>>> shuffle(rest)
>>> rest
['c   20\n', 'd   45\n', 'a   15\n', 'b   14\n']
>>> [header]+rest
['h1  h2\n', 'c   20\n', 'd   45\n', 'a   15\n', 'b   14\n']

Using with statement:使用 with 语句：

>>> with open('random.csv','r') as ip:
...   data=ip.readlines()
...
>>> header, rest=data[0], data[1:]
>>> shuffle(rest)
>>> with open('output.csv','w') as out:
...   out.write(''.join([header]+rest))
...
>>>
~$ cat output.csv
h1  h2
d   45
b   14
a   15
c   20

Answer 5

I think you should read the actual lines of the file.我认为您应该阅读文件的实际行。

ip.readlines()

And random.shuffle() should be used to swap around the lines.应该使用random.shuffle()来交换线条。

At the moment, you read an entire string and I think only randomly get a single character from the entire file.目前，您读取了整个字符串，我认为只能从整个文件中随机获取一个字符。

Answer 6

If your CSV contains headers then you can shuffle it using pandas like this.如果您的 CSV 包含标题，那么您可以使用这样的 Pandas 对其进行洗牌。

df = pd.read_csv(file_name) # avoid header=None. 
shuffled_df = df.sample(frac=1)
shuffled_df.to_csv(new_file_name, index=False)

This way you can avoid shuffling headers and remove index from your new CSV.这样您就可以避免改组标题并从新的 CSV 中删除索引。

Answer 7

I'm using this code according to @cricket answer:我根据@cricket 的回答使用此代码：

from random import shuffle

with open('input.csv','r') as f1:
    data=f1.readlines()

shuffle(data[0])

with open('output.csv','w') as f2:
    f2.write(''.join([data[1:]] + data[0]))

Answer 8

I follow this way.我按照这条路。

import numpy as np
import pandas as pd

df = pd.read_csv("your_csv_file.csv", header=0)    
df.reindex(np.random.permutation(df.index))

使用 Python 打乱 csv 文件的所有行

问题描述

8 个解决方案

解决方案1
14 2017-02-24 13:35:21

解决方案2
6 2017-02-24 12:31:26

解决方案3
3 2018-04-25 18:15:33

解决方案4
2 2017-02-24 12:31:47

解决方案5
2 2017-02-24 12:32:17

解决方案6
2 2020-02-28 06:42:14

解决方案7
0 2020-03-19 21:48:11

解决方案8
0 2021-07-07 02:04:09

使用 Python 打乱 csv 文件的所有行

问题描述

8 个解决方案

解决方案1 14 2017-02-24 13:35:21

解决方案2 6 2017-02-24 12:31:26

解决方案3 3 2018-04-25 18:15:33

解决方案4 2 2017-02-24 12:31:47

解决方案5 2 2017-02-24 12:32:17

解决方案6 2 2020-02-28 06:42:14

解决方案7 0 2020-03-19 21:48:11

解决方案8 0 2021-07-07 02:04:09

解决方案1
14 2017-02-24 13:35:21

解决方案2
6 2017-02-24 12:31:26

解决方案3
3 2018-04-25 18:15:33

解决方案4
2 2017-02-24 12:31:47

解决方案5
2 2017-02-24 12:32:17

解决方案6
2 2020-02-28 06:42:14

解决方案7
0 2020-03-19 21:48:11

解决方案8
0 2021-07-07 02:04:09