[英]Issue with merging two dataframes in Pandas
我试图离开合并两个数据帧,但我遇到了一个问题。 我在正确的数据框中的列中仅获得NaN。
这是我所做的:
X = read_csv('fileA.txt',sep=',',header=0);
print "-----FILE DATA-----"
print X;
X = X.astype(object); # convert every column to string type? does it do it?
print "-----INTERNALS-----"
pprint(vars(X));
Y = file_to_dataframe('fileB.txt',',',0);
print "-----FILE DATA-----"
print Y;
print "-----INTERNALS-----"
pprint(vars(Y));
Z = merge(X,Y,how='left');
print Z;
sys.exit();
Y = file_to_dataframe('tmp.chr20.thresh.frq.count','\t',0);
print Y.dtypes;
def file_to_dataframe(filename,sep,header): # list of dict's
i = 0; k = 0;
cols = list();
colNames = list();
for line in fileinput.input([filename]):
line = line.rstrip('\n');
lst = line.split(sep);
if i == header: # row number to use as the column names
for colName in lst:
colNames.append(colName);
elif i > header:
j = 0;
record = dict();
for j in range(0,len(lst)): # iterate over all tokens in the current line
if j >= len(colNames):
colNames.append('#Auto_Generated_Label_'+ str(k));
k += 1;
record[colNames[j]] = lst[j];
cols.append(record); # push the record onto stack
i += 1;
return DataFrame.from_records(cols);
这是输出:
-----文件数据-----
Chrom Gene Position
0 20 DZANK1 18446022
1 20 TGM6 2380332
2 20 C20orf96 271226
-----内部-
{'_data': BlockManager
Items: array([Chrom, Gene, Position], dtype=object)
Axis 1: array([0, 1, 2])
ObjectBlock: array([Chrom, Gene, Position], dtype=object), 3 x 3, dtype object,
'_item_cache': {}}
-----文件数据-----
Chrom Position Random
0 20 18446022 ABC
1 20 2380332 XYZ
2 20 271226 PQR
-----内部-
{'_data': BlockManager
Items: array([Chrom, Position, Random], dtype=object)
Axis 1: array([0, 1, 2])
ObjectBlock: array([Chrom, Position, Random], dtype=object), 3 x 3, dtype object,
'_item_cache': {}}
Chrom Gene Position Random
0 20 C20orf96 271226 NaN
1 20 TGM6 2380332 NaN
2 20 DZANK1 18446022 NaN
如您所见,在NaN的列中,Y的Random列中应该有值。有关如何调试此值的任何想法?
为我工作(v0.10.0b1,虽然我有一点信心-但尚未检查-在0.9.1中也可以工作):
In [7]: x
Out[7]:
Chrom Gene Position
0 20 DZANK1 18446022
1 20 TGM6 2380332
2 20 C20orf96 271226
In [8]: y
Out[8]:
Chrom Position Random
0 20 18446022 ABC
1 20 2380332 XYZ
2 20 271226 PQR
In [9]: pd.merge(x, y, how='left')
Out[9]:
Chrom Gene Position Random
0 20 DZANK1 18446022 ABC
1 20 TGM6 2380332 XYZ
2 20 C20orf96 271226 PQR
我很惊讶所有列都是对象dtype。 必须有某种解析问题与您的数据-在每一列(不是它们是什么样子,但实际上他们是 ,字符串,整数,是什么?)检查值
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.