繁体   English   中英

两个 CSV 文件,匹配一行中的一对与第二个 CSV 文件中的匹配值,在由相同类型的值组成的单个列中

[英]Two CSV files, match a pair from a row with matching values in 2nd CSV file, in a single column consisting of the same type of values

我有两个 CSV 文件,我想在第二个 CSV 文件中的第二个 CSV 文件中匹配一对(一组两个)相同类型 int 值的行(循环通过 n 行)重复的相同类型的值。

到目前为止,我写了这段代码,但它非常耗时,也许有这个问题的pythonic捷径?

c1=0
c2=0

def append_list_as_row(file_name, list_of_elem):
    # Open file in append mode
    with open(file_name, 'a+', newline='') as write_obj:
        # Create a writer object from csv module
        csv_writer = writer(write_obj)
        # Add contents of list as last row in the csv file
        csv_writer.writerow(list_of_elem)

pairs = pd.read_csv('pairs.csv',delimiter=';')
df = pd.read_csv('02_Data_test.csv',delimiter=',')

foo = open('foo.csv', 'w')
with open('foo.csv', 'w', newline='') as outcsv:
    writer1 = csv.DictWriter(outcsv, fieldnames = ["##","lac","cid","msisdn","imei","event_type","tstamp","long","lat","max_dist","cell_type","start_angle","end_angle","msisdn1"])
    writer1.writeheader()

for i in range(0,122,1): #range(len(pairs)): 
    for j in range(0,174123,1): #range(len(df)):

        if pairs.iloc[i,0]==df.iloc[j,3]:
            c1+=1
            print(i)
            append_list_as_row('foo.csv', df.iloc[j,:])        
        if pairs.iloc[i,1]==df.iloc[j,3]:
            c2+=1
            print(i)
            print(j)
            print("")
            append_list_as_row('foo.csv', df.iloc[j,:])        

        #if pairs.iloc[i,1]==df.iloc[j,3]:
         #   c2+=1
          #  print(i)
           # print(j)
            #append_list_as_row('foo.csv', df.iloc[j,:])  

    print("------------------------")
    append_list_as_row('foo.csv', "")    

您可以使用 pd.isin([list]) function 的 pandas.DataFrame 从 CSV2 中提取数据,这属于您的msisdn数字对

样本输入

pairs
msisdn1   msisdn2
msisdn1  msisdn11
msisdn2  msisdn12
msisdn3  msisdn13
msisdn4  msisdn14
msisdn5  msisdn15

data
test    moretest    no_test    msisdn
test1   moretest1   no_test1   msisdn1
test2   moretest2   no_test2   msisdn2
test3   moretest3   no_test3   msisdn3
test4   moretest4   no_test4   msisdn4
test5   moretest5   no_test5   msisdn5
test6   moretest6   no_test6   msisdn6
test7   moretest7   no_test7   msisdn7
test8   moretest8   no_test8   msisdn8
test9   moretest9   no_test9   msisdn9
test10  moretest10  no_test10  msisdn10
test11  moretest11  no_test11  msisdn11
test12  moretest12  no_test12  msisdn12
test13  moretest13  no_test13  msisdn13
test14  moretest14  no_test14  msisdn14
test15  moretest15  no_test15  msisdn15
test16  moretest16  no_test16  msisdn16
test17  moretest17  no_test17  msisdn17
test18  moretest18  no_test18  msisdn18
test19  moretest19  no_test19  msisdn19
test20  moretest20  no_test20  msisdn20

代码:

csv1 = pd.read_csv('pairs.csv')
csv1 = pd.read_csv('02_Data_test.csv')
# res is a list that will hold all the extracted rows
# and we will finally append all results into a DataFrame
res = []
for pairs in csv1.values.tolist():
    res.append(csv2[csv2['msisdn'].isin(pairs)])

df = pd.concat(res)
df.to_csv('result.csv', index=False)

样品 Output

      test    moretest    no_test    msisdn
0    test1   moretest1   no_test1   msisdn1
10  test11  moretest11  no_test11  msisdn11
1    test2   moretest2   no_test2   msisdn2
11  test12  moretest12  no_test12  msisdn12
2    test3   moretest3   no_test3   msisdn3
12  test13  moretest13  no_test13  msisdn13
3    test4   moretest4   no_test4   msisdn4
13  test14  moretest14  no_test14  msisdn14
4    test5   moretest5   no_test5   msisdn5
14  test15  moretest15  no_test15  msisdn15

希望这可以帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM