簡體   English   中英

如何使用pandas.read_csv將CSV文件中的數據插入數據框?

[英]How can I insert data from a CSV file into a dataframe using pandas.read_csv?

我有一個csv文件,如:

"B/G/213","B/C/208","WW_cis",,
"B/U/215","B/A/206","WW_cis",,
"B/C/214","B/G/207","WW_cis",,
"B/G/217","B/C/204","WW_cis",,
"B/A/216","B/U/205","WW_cis",,
"B/C/219","B/G/202","WW_cis",,
"B/U/218","B/A/203","WW_cis",,
"B/G/201","B/C/220","WW_cis",,
"B/A/203","B/U/218","WW_cis",,

我想把它讀成數組或數據框,這樣我就可以將一列中的元素與另一列中的選定元素進行比較。 起初,我已經使用numpy.genfromtxt其直接讀入一個數組,但我得到了像'"B/A/203"'帶有額外引號的"無處不在。我在某處閱讀,pandas允許刪除額外的字符串"所以我試過了:

class StructureReader(object):
    def __init__(self, filename):
        self.filename=filename
    def read(self):
        self.data=pd.read_csv(StringIO(str("RNA/"+self.filename)), header=None, sep = ",")
        self.data

但我得到這樣的東西:

<class 'pandas.core.frame.DataFrame'> 0 0 RNA/4v6p.csv

如何將我的CSV文件轉換為允許我搜索列和行的某種數據類型?

數據插入

您將文件名的字符串放入DataFrame ,即RNA/4v6p.csv是位於row 0, col 0 您需要讀入文件並存儲數據。 這可以通過刪除類中的StringIO(str(...))來完成

class StructureReader(object):
    def __init__(self, filename):
        self.filename = filename
    def read(self):
        self.data = pd.read_csv("RNA/"+self.filename), header=None, sep = ",")
        self.data

代碼結構批判

我還建議刪除父目錄以進行硬編碼

  1. 始終傳入完整的文件路徑

     class StructureReader(object): def __init__(self, filepath): self.filepath = filepath def read(self): self.data = pd.read_csv(self.filepath), header=None, sep = ",") self.data 
  2. 使目錄成為__init__()參數

     class StructureReader(object): def __init__(self, directory, filename): self.directory = directory self.filename = filename def read(self): self.data=pd.read_csv(self.directory+"/"+self.filename), header=None, sep = ",") # or import os and self.data=pd.read_csv(os.path.join(self.directory, self.filename)), header=None, sep = ",") self.data 
  3. 使目錄成為常量屬性

     class StructureReader(object): def __init__(self, filename): self.directory = "RNA" self.filename = filename def read(self): self.data = pd.read_csv(self.directory+"/"+self.filename), header=None, sep = ",") # or import os and self.data=pd.read_csv(os.path.join(self.directory, self.filename)), header=None, sep = ",") self.data 

這與閱讀數據無關,只是構建代碼的最佳實踐評論(僅限我的0.02美元 )。

IIUC,您可以通過以下方式閱讀:

df = pd.read_csv('yourfile.csv', header=None)

那對我來說:

         0        1       2   3   4
0  B/G/213  B/C/208  WW_cis NaN NaN
1  B/U/215  B/A/206  WW_cis NaN NaN
2  B/C/214  B/G/207  WW_cis NaN NaN
3  B/G/217  B/C/204  WW_cis NaN NaN
4  B/A/216  B/U/205  WW_cis NaN NaN
5  B/C/219  B/G/202  WW_cis NaN NaN
6  B/U/218  B/A/203  WW_cis NaN NaN
7  B/G/201  B/C/220  WW_cis NaN NaN
8  B/A/203  B/U/218  WW_cis NaN NaN

然后,您可以只選擇所需的列:

df = df[[0,1,2]]

和數據幀一樣正常運行。

我認為你已經將StringIO與文件名混淆了。 您要么將數據作為字符串,然后使用StringIO,要么只指定文件名( 使用StringIO):

In [189]: data="""\
   .....: "B/G/213","B/C/208","WW_cis",,
   .....: "B/U/215","B/A/206","WW_cis",,
   .....: "B/C/214","B/G/207","WW_cis",,
   .....: "B/G/217","B/C/204","WW_cis",,
   .....: "B/A/216","B/U/205","WW_cis",,
   .....: "B/C/219","B/G/202","WW_cis",,
   .....: "B/U/218","B/A/203","WW_cis",,
   .....: "B/G/201","B/C/220","WW_cis",,
   .....: "B/A/203","B/U/218","WW_cis",,
   .....: """

In [190]:

In [190]: df = pd.read_csv(io.StringIO(data), sep=',', header=None, usecols=[0,1,2])

In [191]: df
Out[191]:
         0        1       2
0  B/G/213  B/C/208  WW_cis
1  B/U/215  B/A/206  WW_cis
2  B/C/214  B/G/207  WW_cis
3  B/G/217  B/C/204  WW_cis
4  B/A/216  B/U/205  WW_cis
5  B/C/219  B/G/202  WW_cis
6  B/U/218  B/A/203  WW_cis
7  B/G/201  B/C/220  WW_cis
8  B/A/203  B/U/218  WW_cis

PS你可以決定要解析哪些列(在數據框中有) - 查看usecols參數

或者使用文件名

import os

df = pd.read_csv(os.path.join('RNA', self.filename), sep=',', header=None, usecols=[0,1,2])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM