[英]Shuffle all rows of a csv file with Python
I have an input csv file with data:我有一个包含数据的输入 csv 文件:
a 15
b 14
c 20
d 45
I want to generate a different csv file which will contain complete data rows from input file but rows should be shuffled.我想生成一个不同的 csv 文件,该文件将包含来自输入文件的完整数据行,但行应该被打乱。
like output file may contain values-像输出文件可能包含值-
b 14
a 15
c 20
d 45
I have tried this code:我试过这个代码:
import random
import sys
op=open('random.csv','w+')
ip=open(sys.argv[1],'r')
data=ip.read()
data1=str(random.choices(data))
op.write(data1)
op.close()
Another shot using pandas
.另一个使用
pandas
镜头。 You can read your .csv file with:您可以使用以下命令读取 .csv 文件:
df = pd.read_csv('yourfile.csv', header=None)
and then using df.sample
to shuffle your rows.然后使用
df.sample
来洗牌你的行。 This will return a random sample of your dataframe with rows shuffled.这将返回数据帧的随机样本,其中的行被打乱。 Using
frac=1
you consider the whole set as sample:使用
frac=1
您将整个集合视为样本:
In [18]: df
Out[18]:
0 1
0 a 15
1 b 14
2 c 20
3 d 45
In [19]: ds = df.sample(frac=1)
In [20]: ds
Out[20]:
0 1
1 b 14
3 d 45
0 a 15
2 c 20
If you need to save out again the new shuffled file you can just:如果您需要再次保存新的洗牌文件,您可以:
ds.to_csv('newfile.csv')
You can use the shuffle function from Python random module.您可以使用 Python random模块中的shuffle函数。 Like this:
像这样:
import random
fid = open("example.txt", "r")
li = fid.readlines()
fid.close()
print(li)
random.shuffle(li)
print(li)
fid = open("shuffled_example.txt", "w")
fid.writelines(li)
fid.close()
The print commands result in this:打印命令导致:
['b 14\n', 'a 15\n', 'c 20\n', 'd 45\n']
['d 45\n', 'a 15\n', 'b 14\n', 'c 20\n']
And the new file is this:新文件是这样的:
d 45
a 15
b 14
c 20
Just make sure you have a newline at the end of each of your original lines.只需确保在每个原始行的末尾都有一个换行符。
There is a shuffle function in the random module. random 模块中有一个shuffle函数。 Also, you can you
readlines()
in order to have a list:此外,您可以
readlines()
以获得列表:
>>> ip=open('random.csv','r')
>>> data=ip.readlines()
>>> data
['a 15\n', 'b 14\n', 'c 20\n', 'd 45\n']
>>> from random import shuffle
>>> shuffle(data)
>>> data
['c 20\n', 'd 45\n', 'a 15\n', 'b 14\n']
If you have an header, just split the data, and shuffle the rows:如果您有标题,只需拆分数据并随机排列行:
>>> ip=open('random.csv','r')
>>> data=ip.readlines()
>>> header, rest=data[0], data[1:]
>>> header
'h1 h2\n'
>>> rest
['a 15\n', 'b 14\n', 'c 20\n', 'd 45\n']
>>> shuffle(rest)
>>> rest
['c 20\n', 'd 45\n', 'a 15\n', 'b 14\n']
>>> [header]+rest
['h1 h2\n', 'c 20\n', 'd 45\n', 'a 15\n', 'b 14\n']
Using with statement:使用 with 语句:
>>> with open('random.csv','r') as ip:
... data=ip.readlines()
...
>>> header, rest=data[0], data[1:]
>>> shuffle(rest)
>>> with open('output.csv','w') as out:
... out.write(''.join([header]+rest))
...
>>>
~$ cat output.csv
h1 h2
d 45
b 14
a 15
c 20
I think you should read the actual lines of the file.我认为您应该阅读文件的实际行。
ip.readlines()
And random.shuffle()
should be used to swap around the lines.应该使用
random.shuffle()
来交换线条。
At the moment, you read an entire string and I think only randomly get a single character from the entire file.目前,您读取了整个字符串,我认为只能从整个文件中随机获取一个字符。
If your CSV contains headers then you can shuffle it using pandas like this.如果您的 CSV 包含标题,那么您可以使用这样的 Pandas 对其进行洗牌。
df = pd.read_csv(file_name) # avoid header=None.
shuffled_df = df.sample(frac=1)
shuffled_df.to_csv(new_file_name, index=False)
This way you can avoid shuffling headers and remove index from your new CSV.这样您就可以避免改组标题并从新的 CSV 中删除索引。
I'm using this code according to @cricket answer:我根据@cricket 的回答使用此代码:
from random import shuffle
with open('input.csv','r') as f1:
data=f1.readlines()
shuffle(data[0])
with open('output.csv','w') as f2:
f2.write(''.join([data[1:]] + data[0]))
I follow this way.我按照这条路。
import numpy as np
import pandas as pd
df = pd.read_csv("your_csv_file.csv", header=0)
df.reindex(np.random.permutation(df.index))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.