简体   繁体   English

Numpy将CSV分为几列

[英]Numpy Separating CSV into columns

I'm trying to use a CSV imported from bballreference.com. 我正在尝试使用从bballreference.com导入的CSV。 But as you can see, the separated values are all in one row rather than separated by columns. 但是正如您所看到的,分隔的值全都在一行中,而不是按列分隔。 On NumPy Pandas, what would be the easiest way to fix this? 在NumPy Pandas上,最简单的解决方法是什么? I've googled to no avail. 我没有用谷歌搜索。

csv on jupyter CSV在jupyter上

I don't know how to post CSV file in a clean way but here it is: 我不知道如何以干净的方式发布CSV文件,但是这里是:

",,,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Totals,Shooting,Shooting,Shooting,Per Game,Per Game,Per Game,Per Game,Per Game,Per Game"
"Rk,Player,Age,G,GS,MP,FG,FGA,3P,3PA,FT,FTA,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,FG%,3P%,FT%,MP,PTS,TRB,AST,STL,BLK"
"1,Kevin Durant\duranke01,29,5,5,182,54,107,9,28,22,27,3,34,37,24,7,6,10,7,139,.505,.321,.815,36.5,27.8,7.4,4.8,1.4,1.2"
"2,Klay Thompson\thompkl01,27,5,5,183,38,99,12,43,11,11,3,29,32,9,1,2,6,11,99,.384,.279,1.000,36.7,19.8,6.4,1.8,0.2,0.4"
"3,Stephen Curry\curryst01,29,4,3,125,32,67,15,34,19,19,2,19,21,14,8,2,15,6,98,.478,.441,1.000,31.2,24.5,5.3,3.5,2.0,0.5"
"4,Draymond Green\greendr01,27,5,5,186,27,55,8,20,12,15,12,47,59,50,12,8,18,16,74,.491,.400,.800,37.1,14.8,11.8,10.0,2.4,1.6"
"5,Andre Iguodala\iguodan01,34,5,4,140,14,29,4,12,7,12,4,21,25,17,10,2,3,7,39,.483,.333,.583,27.9,7.8,5.0,3.4,2.0,0.4"
"6,Quinn Cook\cookqu01,24,4,0,58,12,27,0,10,6,8,1,8,9,4,1,0,2,4,30,.444,.000,.750,14.4,7.5,2.3,1.0,0.3,0.0"
"7,Kevon Looney\looneke01,21,5,0,113,12,17,0,0,4,8,10,19,29,5,4,1,2,17,28,.706,,.500,22.6,5.6,5.8,1.0,0.8,0.2"
"8,Shaun Livingston\livinsh01,32,5,0,79,11,27,0,0,4,4,0,6,6,12,0,1,3,9,26,.407,,1.000,15.9,5.2,1.2,2.4,0.0,0.2"
"9,David West\westda01,37,5,0,40,8,14,0,0,0,0,2,5,7,13,2,4,3,4,16,.571,,,7.9,3.2,1.4,2.6,0.4,0.8"
"10,Nick Young\youngni01,32,4,2,41,3,11,3,10,2,3,0,4,4,1,1,0,1,3,11,.273,.300,.667,10.2,2.8,1.0,0.3,0.3,0.0"
"11,JaVale McGee\mcgeeja01,30,3,1,19,3,8,0,1,0,0,4,2,6,0,0,1,0,2,6,.375,.000,,6.2,2.0,2.0,0.0,0.0,0.3"
"12,Zaza Pachulia\pachuza01,33,2,0,8,1,2,0,0,2,4,4,2,6,0,2,0,1,1,4,.500,,.500,4.2,2.0,3.0,0.0,1.0,0.0"
"13,Jordan Bell\belljo01,23,4,0,23,1,4,0,0,1,2,1,5,6,5,2,2,0,2,3,.250,,.500,5.8,0.8,1.5,1.3,0.5,0.5"
"14,Damian Jones\jonesda03,22,1,0,3,0,1,0,0,2,2,0,0,0,0,0,0,0,0,2,.000,,1.000,3.2,2.0,0.0,0.0,0.0,0.0"
",Team Totals,26.5,5,,1200,216,468,51,158,92,115,46,201,247,154,50,29,64,89,575,.462,.323,.800,240.0,115.0,49.4,30.8,10.0,5.8"

It seems that the first two rows of your CSV file are headers, but the default behavior of pd.read_csv thinks that only the first row is header. 似乎CSV文件的前两行是标头,但是pd.read_csv的默认行为认为只有第一行是标头。

Also, the beginning and trailing quotes make pd.read_csv think the text in between is a single field/column. 同样, pd.read_csv引号使pd.read_csv之间的文本是单个字段/列。

You could try the following: 您可以尝试以下方法:

Remove the beginning and trailing quotes, and 删除开头和结尾的引号,并

bbal = pd.read_csv('some_file.csv', header=[0, 1], delimiter=',')

Following is how you could use Python to remove the beginning and trailing quotes: 以下是如何使用Python删除开头和结尾引号的方法:

# open 'quotes.csv' in read mode with variable in_file as handle
# open 'no_quotes.csv' in write mode with variable out_file as handle
with open('quotes.csv') as in_file, open('no_quotes.csv', 'w') as out_file:
    # read in_file line by line
    # the variable line stores each line as string
    for line in in_file:
        # line[1:-1] slices the string to omit the first and last character
        # append a newline character '\n' to the sliced line
        # write the string with newline to out_file
        out_file.write(line[1:-1] + '\n')

# read_csv on 'no_quotes.csv'
bbal = pd.read_csv('no_quotes.csv', header=[0, 1], delimiter=',')
bbal.head()

Consider reading in csv as a text file to be stripped of the beginning/end quotes per line on a text file read which tell the parser all data between is a singular value. 考虑将csv作为文本文件读取,以去除读取的文本文件中每行的开头/结尾引号,该行告诉解析器之间的所有数据均为奇异值。 And use built-in StringIO to read text string into dataframe instead of saving to disk for import. 并使用内置的StringIO将文本字符串读入数据帧,而不是保存到磁盘以进行导入。

Additionally, skip the first row of repeated Totals and Per Game and even the last row that aggregates since you can do that with pandas. 另外,由于您可以使用熊猫来进行此操作,因此跳过重复的“ 总计”和“ 每场比赛”的第一行,甚至跳过汇总的最后一行。

from io import StringIO
import pandas as pd

with open('BasketballCSVQuotes.csv') as f:
    csvdata = f.read().replace('"', '')

df = pd.read_csv(StringIO(csvdata), skiprows=1, skipfooter=1, engine='python')

print(df)

Output 输出量

      Rk                      Player   Age  G   GS    MP   FG  FGA  3P  3PA  ...    PTS    FG%    3P%    FT%   MP.1  PTS.1  TRB.1  AST.1  STL.1  BLK.1
0    1.0      Kevin Durant\duranke01  29.0  5  5.0   182   54  107   9   28  ...    139  0.505  0.321  0.815   36.5   27.8    7.4    4.8    1.4    1.2
1    2.0     Klay Thompson\thompkl01  27.0  5  5.0   183   38   99  12   43  ...     99  0.384  0.279  1.000   36.7   19.8    6.4    1.8    0.2    0.4
2    3.0     Stephen Curry\curryst01  29.0  4  3.0   125   32   67  15   34  ...     98  0.478  0.441  1.000   31.2   24.5    5.3    3.5    2.0    0.5
3    4.0    Draymond Green\greendr01  27.0  5  5.0   186   27   55   8   20  ...     74  0.491  0.400  0.800   37.1   14.8   11.8   10.0    2.4    1.6
4    5.0    Andre Iguodala\iguodan01  34.0  5  4.0   140   14   29   4   12  ...     39  0.483  0.333  0.583   27.9    7.8    5.0    3.4    2.0    0.4
5    6.0         Quinn Cook\cookqu01  24.0  4  0.0    58   12   27   0   10  ...     30  0.444  0.000  0.750   14.4    7.5    2.3    1.0    0.3    0.0
6    7.0      Kevon Looney\looneke01  21.0  5  0.0   113   12   17   0    0  ...     28  0.706    NaN  0.500   22.6    5.6    5.8    1.0    0.8    0.2
7    8.0  Shaun Livingston\livinsh01  32.0  5  0.0    79   11   27   0    0  ...     26  0.407    NaN  1.000   15.9    5.2    1.2    2.4    0.0    0.2
8    9.0         David West\westda01  37.0  5  0.0    40    8   14   0    0  ...     16  0.571    NaN    NaN    7.9    3.2    1.4    2.6    0.4    0.8
9   10.0        Nick Young\youngni01  32.0  4  2.0    41    3   11   3   10  ...     11  0.273  0.300  0.667   10.2    2.8    1.0    0.3    0.3    0.0
10  11.0      JaVale McGee\mcgeeja01  30.0  3  1.0    19    3    8   0    1  ...      6  0.375  0.000    NaN    6.2    2.0    2.0    0.0    0.0    0.3
11  12.0     Zaza Pachulia\pachuza01  33.0  2  0.0     8    1    2   0    0  ...      4  0.500    NaN  0.500    4.2    2.0    3.0    0.0    1.0    0.0
12  13.0           Jordan Belelljo01  23.0  4  0.0    23    1    4   0    0  ...      3  0.250    NaN  0.500    5.8    0.8    1.5    1.3    0.5    0.5
13  14.0      Damian Jones\jonesda03  22.0  1  0.0     3    0    1   0    0  ...      2  0.000    NaN  1.000    3.2    2.0    0.0    0.0    0.0    0.0

[14 rows x 30 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM