在熊猫中合并两个数据框的问题

Question

我试图离开合并两个数据帧，但我遇到了一个问题。 我在正确的数据框中的列中仅获得NaN。

这是我所做的：

X = read_csv('fileA.txt',sep=',',header=0);
print "-----FILE DATA-----"
print X;
X = X.astype(object); # convert every column to string type? does it do it?
print "-----INTERNALS-----"
pprint(vars(X));

Y = file_to_dataframe('fileB.txt',',',0); 
print "-----FILE DATA-----"
print Y;
print "-----INTERNALS-----"
pprint(vars(Y));

Z = merge(X,Y,how='left');
print Z;
sys.exit(); 

Y = file_to_dataframe('tmp.chr20.thresh.frq.count','\t',0);
print Y.dtypes;

def file_to_dataframe(filename,sep,header): # list of dict's
    i = 0; k = 0;
    cols = list();
    colNames = list();
    for line in fileinput.input([filename]):
        line = line.rstrip('\n');
        lst = line.split(sep);
        if i == header: #  row number to use as the column names
            for colName in lst:
                colNames.append(colName);
        elif i > header:
            j = 0;
            record = dict();
            for j in range(0,len(lst)): # iterate over all tokens in the current line
                if j >= len(colNames):
                    colNames.append('#Auto_Generated_Label_'+ str(k));
                    k += 1;
                record[colNames[j]] = lst[j];
            cols.append(record); # push the record onto stack
        i += 1;
    return DataFrame.from_records(cols);

这是输出：

-----文件数据-----

   Chrom      Gene  Position


0     20    DZANK1  18446022


1     20      TGM6   2380332


2     20  C20orf96    271226

-----内部-

{'_data': BlockManager


Items: array([Chrom, Gene, Position], dtype=object)


Axis 1: array([0, 1, 2])


ObjectBlock: array([Chrom, Gene, Position], dtype=object), 3 x 3, dtype object,


 '_item_cache': {}}

-----文件数据-----

  Chrom  Position Random


0    20  18446022    ABC


1    20   2380332    XYZ


2    20    271226    PQR

-----内部-

{'_data': BlockManager


Items: array([Chrom, Position, Random], dtype=object)


Axis 1: array([0, 1, 2])


ObjectBlock: array([Chrom, Position, Random], dtype=object), 3 x 3, dtype object,


 '_item_cache': {}}



  Chrom      Gene  Position Random

0    20  C20orf96    271226    NaN

1    20      TGM6   2380332    NaN

2    20    DZANK1  18446022    NaN

如您所见，在NaN的列中，Y的Random列中应该有值。有关如何调试此值的任何想法？

Answer 1

为我工作（v0.10.0b1，虽然我有一点信心-但尚未检查-在0.9.1中也可以工作）：

In [7]: x
Out[7]: 
   Chrom      Gene  Position
0     20    DZANK1  18446022
1     20      TGM6   2380332
2     20  C20orf96    271226

In [8]: y
Out[8]: 
   Chrom  Position Random
0     20  18446022    ABC
1     20   2380332    XYZ
2     20    271226    PQR

In [9]: pd.merge(x, y, how='left')
Out[9]: 
   Chrom      Gene  Position Random
0     20    DZANK1  18446022    ABC
1     20      TGM6   2380332    XYZ
2     20  C20orf96    271226    PQR

我很惊讶所有列都是对象dtype。 必须有某种解析问题与您的数据-在每一列（不是它们是什么样子，但实际上他们是，字符串，整数，是什么？）检查值

在熊猫中合并两个数据框的问题

问题描述

1 个解决方案

解决方案1
1 2012-12-13 02:37:59

在熊猫中合并两个数据框的问题

问题描述

1 个解决方案

解决方案1 1 2012-12-13 02:37:59

解决方案1
1 2012-12-13 02:37:59