[英]How to specify dtype when using pandas.read_csv to load data from csv files?
[英]How can I insert data from a CSV file into a dataframe using pandas.read_csv?
我有一個csv文件,如:
"B/G/213","B/C/208","WW_cis",,
"B/U/215","B/A/206","WW_cis",,
"B/C/214","B/G/207","WW_cis",,
"B/G/217","B/C/204","WW_cis",,
"B/A/216","B/U/205","WW_cis",,
"B/C/219","B/G/202","WW_cis",,
"B/U/218","B/A/203","WW_cis",,
"B/G/201","B/C/220","WW_cis",,
"B/A/203","B/U/218","WW_cis",,
我想把它讀成數組或數據框,這樣我就可以將一列中的元素與另一列中的選定元素進行比較。 起初,我已經使用numpy.genfromtxt
其直接讀入一個數組,但我得到了像'"B/A/203"'
帶有額外引號的"
無處不在。我在某處閱讀,pandas允許刪除額外的字符串"
所以我試過了:
class StructureReader(object):
def __init__(self, filename):
self.filename=filename
def read(self):
self.data=pd.read_csv(StringIO(str("RNA/"+self.filename)), header=None, sep = ",")
self.data
但我得到這樣的東西:
<class 'pandas.core.frame.DataFrame'> 0 0 RNA/4v6p.csv
如何將我的CSV文件轉換為允許我搜索列和行的某種數據類型?
您將文件名的字符串放入DataFrame
,即RNA/4v6p.csv
是位於row 0, col 0
。 您需要讀入文件並存儲數據。 這可以通過刪除類中的StringIO(str(...))
來完成
class StructureReader(object):
def __init__(self, filename):
self.filename = filename
def read(self):
self.data = pd.read_csv("RNA/"+self.filename), header=None, sep = ",")
self.data
我還建議刪除父目錄以進行硬編碼
始終傳入完整的文件路徑
class StructureReader(object): def __init__(self, filepath): self.filepath = filepath def read(self): self.data = pd.read_csv(self.filepath), header=None, sep = ",") self.data
使目錄成為__init__()
參數
class StructureReader(object): def __init__(self, directory, filename): self.directory = directory self.filename = filename def read(self): self.data=pd.read_csv(self.directory+"/"+self.filename), header=None, sep = ",") # or import os and self.data=pd.read_csv(os.path.join(self.directory, self.filename)), header=None, sep = ",") self.data
使目錄成為常量屬性
class StructureReader(object): def __init__(self, filename): self.directory = "RNA" self.filename = filename def read(self): self.data = pd.read_csv(self.directory+"/"+self.filename), header=None, sep = ",") # or import os and self.data=pd.read_csv(os.path.join(self.directory, self.filename)), header=None, sep = ",") self.data
這與閱讀數據無關,只是構建代碼的最佳實踐評論(僅限我的0.02美元 )。
IIUC,您可以通過以下方式閱讀:
df = pd.read_csv('yourfile.csv', header=None)
那對我來說:
0 1 2 3 4
0 B/G/213 B/C/208 WW_cis NaN NaN
1 B/U/215 B/A/206 WW_cis NaN NaN
2 B/C/214 B/G/207 WW_cis NaN NaN
3 B/G/217 B/C/204 WW_cis NaN NaN
4 B/A/216 B/U/205 WW_cis NaN NaN
5 B/C/219 B/G/202 WW_cis NaN NaN
6 B/U/218 B/A/203 WW_cis NaN NaN
7 B/G/201 B/C/220 WW_cis NaN NaN
8 B/A/203 B/U/218 WW_cis NaN NaN
然后,您可以只選擇所需的列:
df = df[[0,1,2]]
和數據幀一樣正常運行。
我認為你已經將StringIO與文件名混淆了。 您要么將數據作為字符串,然后使用StringIO,要么只指定文件名( 不使用StringIO):
In [189]: data="""\
.....: "B/G/213","B/C/208","WW_cis",,
.....: "B/U/215","B/A/206","WW_cis",,
.....: "B/C/214","B/G/207","WW_cis",,
.....: "B/G/217","B/C/204","WW_cis",,
.....: "B/A/216","B/U/205","WW_cis",,
.....: "B/C/219","B/G/202","WW_cis",,
.....: "B/U/218","B/A/203","WW_cis",,
.....: "B/G/201","B/C/220","WW_cis",,
.....: "B/A/203","B/U/218","WW_cis",,
.....: """
In [190]:
In [190]: df = pd.read_csv(io.StringIO(data), sep=',', header=None, usecols=[0,1,2])
In [191]: df
Out[191]:
0 1 2
0 B/G/213 B/C/208 WW_cis
1 B/U/215 B/A/206 WW_cis
2 B/C/214 B/G/207 WW_cis
3 B/G/217 B/C/204 WW_cis
4 B/A/216 B/U/205 WW_cis
5 B/C/219 B/G/202 WW_cis
6 B/U/218 B/A/203 WW_cis
7 B/G/201 B/C/220 WW_cis
8 B/A/203 B/U/218 WW_cis
PS你可以決定要解析哪些列(在數據框中有) - 查看usecols
參數
或者使用文件名
import os
df = pd.read_csv(os.path.join('RNA', self.filename), sep=',', header=None, usecols=[0,1,2])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.