简体   繁体   中英

merge 2 csv files by columns error related to strings?

I am trying to merge 2 csv files by column.
my both csv ends with '_4.csv' as filename, and the final result of the merged csv is something like below:

    0-10       ,83.72,66.76,86.98  ,0-10       ,83.72,66.76,86.98
    11-20      ,15.01,31.12,12.04  ,11-20      ,15.01,31.12,12.04
    21-30      ,1.14,2.05,0.94     ,21-30      ,1.14,2.05,0.94
    31-40      ,0.13,0.07,0.03     ,31-40      ,0.13,0.07,0.03
    over 40    ,0.0,0.0,0.0        ,over 40    ,0.0,0.0,0.0
    UHF case   ,0.0,0.0,0.0        ,UHF case   ,0.0,0.0,0.0

my code:

    #combine 2 csv into 1 by columns
    files_in_dir = [f for f in os.listdir(os.getcwd()) if f.endswith('_4.csv')]
    temp_data = []
    for filenames in files_in_dir:
        temp_data.append(np.loadtxt(filenames,dtype='str'))
    temp_data = np.array(temp_data)
    np.savetxt('_mix.csv',temp_data.transpose(),fmt='%s',delimiter=',')

however the error said:

    temp_data.append(np.loadtxt(filenames,dtype='str'))
    for x in read_data(_loadtxt_chunksize):
    raise ValueError("Wrong number of columns at line %d"
    ValueError: Wrong number of columns at line 2

not sure if it is related to the first column being strings rather than values.
Does anyone know how to fix it? much appreciation

I think you're looking for the join method. If we have two .csv files of the form:

0-10       ,83.72,66.76,86.98
11-20      ,15.01,31.12,12.04
21-30      ,1.14,2.05,0.94
31-40      ,0.13,0.07,0.03
over 40    ,0.0,0.0,0.0
UHF case   ,0.0,0.0,0.0

Assuming they both have similar structure, we'll work with one of these named data.csv :

import pandas as pd

# Assumes there are no headers
df1 = pd.read_csv("data.csv", header=None)
df2 = pd.read_csv("data.csv", header=None)

# By default: DataFrame headers are assigned numbers 0, 1, 2, 3
# In the second data frame, we will rename columns so they do not clash.
#   meaning `df2` will now have columns named: 4, 5, 6, 7
df2 = df2.rename(
    columns={
        x: y for x, y in zip(df1.columns, range(len(df2.columns), len(df2.columns) * 2))
    }
)

print(df1.join(df2))

Example output:

             0      1      2      3            4      5      6      7
0  0-10         83.72  66.76  86.98  0-10         83.72  66.76  86.98
1  11-20        15.01  31.12  12.04  11-20        15.01  31.12  12.04
2  21-30         1.14   2.05   0.94  21-30         1.14   2.05   0.94
3  31-40         0.13   0.07   0.03  31-40         0.13   0.07   0.03
4  over 40       0.00   0.00   0.00  over 40       0.00   0.00   0.00
5  UHF case      0.00   0.00   0.00  UHF case      0.00   0.00   0.00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM