简体   繁体   English

用熊猫在csv中的python计数出现

[英]python count occurrences in csv with pandas

I'm new to Python and I'm trying to work on a small project and got a little confused. 我是Python的新手,正在尝试做一个小项目,感到有些困惑。

I have 2 csv files that looks like this: 我有2个csv文件,看起来像这样:

all_cars: all_cars:

first_Car,second_car
Mazda, Skoda
Ferrari, Volkswagen
Volkswagen, Toyota
BMW, Ferrari
BMW, Mercedes

super_cars: super_cars:

super_car_name
Ferrari
BMW
Mercedes

What I'm basicly trying to do is just to count how many times a car from file 2 represented in file 1. If the car represented only in file 1 and not in file 2, I don't want it. 我基本上想做的只是从文件1中表示的文件2计算一辆汽车的次数。如果汽车仅在文件1中表示而不在文件2中表示,我就不要了。

What I'm trying to do based on my example files is : 根据示例文件,我想做的是:

Ferrari : 2
BMY : 2
Mercedes : 1

I'd do it this way: 我会这样:

In [220]: d1.stack().value_counts().to_frame('car').loc[d2.super_car_name]
Out[220]:
          car
Ferrari     2
BMW         2
Mercedes    1

where d1 and d2 - your source DataFrames (which can be easily parsed from CSV files using pd.read_csv() method): 其中d1d2 -源DataFrames(可以从CSV文件使用轻松解析pd.read_csv()方法):

In [218]: d1
Out[218]:
    first_Car  second_car
0       Mazda       Skoda
1     Ferrari  Volkswagen
2  Volkswagen      Toyota
3         BMW     Ferrari
4         BMW    Mercedes

In [219]: d2
Out[219]:
  super_car_name
0        Ferrari
1            BMW
2       Mercedes

You can use isin to find the matches, then stack and value_counts to get everything in one table: 您可以使用isin查找匹配项,然后使用stackvalue_counts在一张表中获取所有内容:

df1[df1.isin(df2.super_car_name.values)].stack().value_counts()

Ferrari     2
BMW         2
Mercedes    1
dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM