如何在Python中隨機抽取CVS文件的樣本？

Question

我是Python的新手，想學習使用它的數據整理過程。 我為此使用jupyter。

我有一個名為fle的文件，具有81,000行和89列。 我想從中隨機選擇大約100行。 我怎么做？ 我一直在跟蹤錯誤。

fle=pd.read_csv("C:\Users\Mine\Documents\ssample.csv", low_memory=  False)
import random
sampl = random.sample(fle, 10)

我得到的錯誤是：

IndexError                                Traceback (most recent call last)
<ipython-input-37-fa4ec429f883> in <module>()
      1 import random
      2 #To take a sample of 10000 samples
 ----> 3 sampl = random.sample(fle, 10)
      4 #pd.DataFrame(sampler).head(10)

  C:\Users\E061921\AppData\Local\Continuum\Anaconda\lib\random.pyc in sample(self, population, k)
334             for i in xrange(k):         # invariant:  non-selected at [0,n-i)
335                 j = _int(random() * (n-i))
--> 336                 result[i] = pool[j]
337                 pool[j] = pool[n-i-1]   # move non-selected item into vacancy
338         else:

IndexError: list index out of range

Answer 1

使用random.choice而不是sample。 您可以使用csv.DictReader將CSV作為字典列表進行處理

import csv
import random

random_rows = set()
with open("C:\Users\Mine\Documents\ssample.csv", "r") as csvfile:
    reader = csv.DictReader(csvfile)

rows = [r for r in reader]
while len(random_rows) < 100:
    random_rows.add(random.choice(rows))

如何在Python中隨機抽取CVS文件的樣本？

問題描述

1 個解決方案

解決方案1
1 2016-06-23 21:58:49

如何在Python中隨機抽取CVS文件的樣本？

問題描述

1 個解決方案

解決方案1 1 2016-06-23 21:58:49

解決方案1
1 2016-06-23 21:58:49