简体   繁体   English

如何在熊猫中将系列转换为数据框

[英]How to convert series to dataframe in Pandas

I have two CSVs I need to compare them based on one column. 我有两个CSV,我需要根据一列进行比较。 And I need to put matched rows in one csv and unmatched rows in other. 我需要将匹配的行放在一个csv中,将不匹配的行放在另一个csv中。 So, I created index on that column in second csv and looped through first. 因此,我在第二个csv中的该列上创建了索引,并首先循环通过。

df1 = pd.read_csv(file1,nrows=100)
df2 = pd.read_csv(file2,nrows=100)
df2.set_index('crc', inplace = True)
matched_list = []
non_matched_list = []
    for _, row in df1.iterrows():
        try:
            x = df2.loc[row['crc']]    
            matched_list.append(x)
        except KeyError:
            non_matched_list.append(row)

The x here is a series in the following format x是以下格式的系列

policyID                   448094
statecode                      FL
county                CLAY COUNTY
eq_site_limit           1322376.3
hu_site_limit           1322376.3
fl_site_limit           1322376.3
fr_site_limit           1322376.3
tiv_2011                1322376.3
tiv_2012               1438163.57
eq_site_deductible              0
hu_site_deductible            0.0
fl_site_deductible              0
fr_site_deductible              0
point_latitude          30.063936
point_longitude        -81.707664
line                  Residential
construction              Masonry
point_granularity               3
Name: 448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,0,0.0, dtype: object

My output csv should be in following format 我的输出csv应该采用以下格式

policyID,statecode,county,eq_site_limit,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
114455,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1

For all the series in the matched and unmatched. 对于所有匹配和不匹配的系列。 How do I do it? 我该怎么做? I can not get rid off index in second csv as performance in important. 我不能摆脱第二个csv中的索引作为重要的性能。

Following are the content of two csv files. 以下是两个csv文件的内容。 File1: 文件1:

policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
114455,FL,CLAY COUNTY,589658,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3
206893,FL,CLAY COUNTY,745689.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1
333743,FL,CLAY COUNTY,0,12563.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3
172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1
785275,FL,CLAY COUNTY,0,515035.62,0,0,515035.62,884419.17,0,0,0,0,30.063236,-81.707703,Residential,Masonry,3
995932,FL,CLAY COUNTY,0,19260000,0,0,19260000,20610000,0,0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1
223488,FL,CLAY COUNTY,328500,328500,328500,328500,328500,348374.25,0,16425,0,0,30.102217,-81.707146,Residential,Wood,1
433512,FL,CLAY COUNTY,315000,315000,315000,315000,315000,265821.57,0,15750,0,0,30.118774,-81.704613,Residential,Wood,1
142071,FL,CLAY COUNTY,705600,705600,705600,705600,705600,1010842.56,14112,35280,0,0,30.100628,-81.703751,Residential,Masonry,1

File2: 文件2:

policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
119736,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3
206893,FL,CLAY COUNTY,190724.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1
333743,FL,CLAY COUNTY,0,79520.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3
172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1
785275,FL,CLAY COUNTY,0,51564.9,0,0,515035.62,884419.17,0,0,0,0,30.063236,-81.707703,Residential,Masonry,3
995932,FL,CLAY COUNTY,0,457962,0,0,19260000,20610000,0,0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1
223488,FL,CLAY COUNTY,328500,328500,328500,328500,328500,348374.25,0,16425,0,0,30.102217,-81.707146,Residential,Wood,1
433512,FL,CLAY COUNTY,315000,315000,315000,315000,315000,265821.57,0,15750,0,0,30.118774,-81.704613,Residential,Wood,1
142071,FL,CLAY COUNTY,705600,705600,705600,705600,705600,1010842.56,14112,35280,0,0,30.100628,-81.703751,Residential,Masonry,1
253816,FL,CLAY COUNTY,831498.3,831498.3,831498.3,831498.3,831498.3,1117791.48,0,0,0,0,30.10216,-81.719444,Residential,Masonry,1
894922,FL,CLAY COUNTY,0,24059.09,0,0,24059.09,33952.19,0,0,0,0,30.095957,-81.695099,Residential,Wood,1

Edit: Added sample csv 编辑:添加示例csv

I think you can do it this way: 我认为您可以这样操作:

df1.loc[df1.crc.isin(df2.index)].to_csv('/path/to/matched.csv', index=False)
df1.loc[~df1.crc.isin(df2.index)].to_csv('/path/to/unmatched.csv', index=False)

instead of looping... 而不是循环...

Demo: 演示:

In [62]: df1.loc[df1.crc.isin(df2.index)].to_csv(r'c:/temp/matched.csv', index=False)

In [63]: df1.loc[~df1.crc.isin(df2.index)].to_csv(r'c:/temp/unmatched.csv', index=False)

Results: 结果:

matched.csv: match.csv:

policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0.0,0,0,30.063935999999998,-81.70766400000001,Residential,Masonry,3
333743,FL,CLAY COUNTY,0.0,12563.76,0.0,0.0,79520.76,86854.48,0,0.0,0,0,30.063236,-81.70770300000001,Residential,Wood,3
172534,FL,CLAY COUNTY,0.0,254281.5,0.0,254281.5,254281.5,246144.49,0,0.0,0,0,30.060614,-81.702675,Residential,Wood,1
785275,FL,CLAY COUNTY,0.0,515035.62,0.0,0.0,515035.62,884419.17,0,0.0,0,0,30.063236,-81.70770300000001,Residential,Masonry,3
995932,FL,CLAY COUNTY,0.0,19260000.0,0.0,0.0,19260000.0,20610000.0,0,0.0,0,0,30.102226,-81.713882,Commercial,Reinforced Concrete,1
223488,FL,CLAY COUNTY,328500.0,328500.0,328500.0,328500.0,328500.0,348374.25,0,16425.0,0,0,30.102217,-81.707146,Residential,Wood,1
433512,FL,CLAY COUNTY,315000.0,315000.0,315000.0,315000.0,315000.0,265821.57,0,15750.0,0,0,30.118774,-81.704613,Residential,Wood,1
142071,FL,CLAY COUNTY,705600.0,705600.0,705600.0,705600.0,705600.0,1010842.56,14112,35280.0,0,0,30.100628000000004,-81.703751,Residential,Masonry,1

unmatched.csv: unmatched.csv:

policyID,statecode,county,crc,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
114455,FL,CLAY COUNTY,589658.0,498960.0,498960.0,498960.0,498960.0,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
206893,FL,CLAY COUNTY,745689.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0.0,0,0,30.089578999999997,-81.700455,Residential,Wood,1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM