简体   繁体   中英

Python: Easier way to merge several columns of an array into one

I have a 1996 * 9 array:

array([[ 0.,  1.,  1., ...,  1.,  1.,  0.],
       [ 1.,  1.,  0., ...,  1.,  0.,  1.],
       [ 0.,  1.,  1., ...,  1.,  1.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  1.],
       [ 0.,  1.,  1., ...,  1.,  1.,  0.],
       [ 0.,  1.,  1., ...,  1.,  1.,  0.]])

I want a 1996 * 1 array.

What I did:

pd.DataFrame(train_L.astype(int)).apply(lambda x: ''.join(str(x)), axis = 1)

I get

0       0    0\n1    1\n2    1\n3    1\n4    1\n5    1...
1       0    1\n1    1\n2    0\n3    0\n4    0\n5    0...
2       0    0\n1    1\n2    1\n3    0\n4    1\n5    1...
3       0    0\n1    1\n2    1\n3    0\n4    1\n5    1...
4       0    1\n1    0\n2    0\n3    0\n4    0\n5    0...

The problem:

  1. I introduced an extra all-zero column.
  2. introduced \\n1
  3. convert type too many times.

My question: Is there a easy way to do the merge without such caveats?


Example output

What I have:

v1 v2 v3 ... v9
1  0  0  ... 1

I want:

      v1
1\t0\t0\t...\t1
  1. The number of columns reduce to 1
  2. Each element is separated by \\t .

Why I need such weird form:

For image processing, we have one column for the labels of image. However, one image may have multiple labels. I have to squeeze multiple labels into 1 column. That's the requirement by the library.

You can apply a lambda after converting the dtype to str:

In [14]:

df = pd.DataFrame(np.random.randn(4,5))
df

Out[14]:
          0         1         2         3         4
0  1.036485 -1.243777  1.286254  1.973786 -0.083245
1  1.698828  1.696846  0.037732 -0.630546 -0.135069
2 -1.231337 -1.166480  0.046414 -0.965710  1.341809
3  0.591176  0.275267 -0.446553 -0.230353  0.258817

In [16]:
df.astype(str).apply(lambda x: ''.join(x), axis=1)

Out[16]:
0    1.03648484941-1.243776761241.286253591521.9737...
1    1.698827772721.696846119330.0377324485782-0.63...
2    -1.23133722226-1.166480155330.046414100678-0.9...
3    0.5911755605680.275266550205-0.446552705185-0....
dtype: object

It seems you want a tab you can just join with a tab:

In [17]:
df.astype(str).apply(lambda x: '\t'.join(x), axis=1)

Out[17]:
0    1.03648484941\t-1.24377676124\t1.28625359152\t...
1    1.69882777272\t1.69684611933\t0.0377324485782\...
2    -1.23133722226\t-1.16648015533\t0.046414100678...
3    0.591175560568\t0.275266550205\t-0.44655270518...
dtype: object

This results in a string, which is probably not what you want. Perhaps you should explain why you would like your data in your requested format.

a = np.array([[ 0.,  1.,  1., 1.,  1.,  0.],
              [ 1.,  1.,  0., 1.,  0.,  1.],
              [ 0.,  1.,  1., 1.,  1.,  0.],
              [ 0.,  0.,  0., 0.,  0.,  1.],
              [ 0.,  1.,  1., 1.,  1.,  0.],
              [ 0.,  1.,  1., 1.,  1.,  0.]])

v = pd.DataFrame(['\t'.join([str(val) for val in row]) for row in a], columns=['v1'])

for row in v.iterrows():
    print(row[1].v1)
0.0     1.0     1.0     1.0     1.0     0.0
1.0     1.0     0.0     1.0     0.0     1.0
0.0     1.0     1.0     1.0     1.0     0.0
0.0     0.0     0.0     0.0     0.0     1.0
0.0     1.0     1.0     1.0     1.0     0.0
0.0     1.0     1.0     1.0     1.0     0.0

>>> v
                             v1
0  0.0\t1.0\t1.0\t1.0\t1.0\t0.0
1  1.0\t1.0\t0.0\t1.0\t0.0\t1.0
2  0.0\t1.0\t1.0\t1.0\t1.0\t0.0
3  0.0\t0.0\t0.0\t0.0\t0.0\t1.0
4  0.0\t1.0\t1.0\t1.0\t1.0\t0.0
5  0.0\t1.0\t1.0\t1.0\t1.0\t0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM