简体   繁体   English

比较 CSV 文件中两列的数据

[英]Comparing Data from two columns in CSV files

Appreciate any help on this one.感谢对此的任何帮助。 I have 7 CSV files (all the same format) that that I have concatenated into one frame.我有 7 个 CSV 文件(格式相同),我已将它们连接到一帧中。 My goal here is to compare two columns from the CSV's and find out how many times the word "Done" from the "Ran" column show up on each Date from the "Date" column.我的目标是比较 CSV 中的两列,并找出“Ran”列中的“Done”一词出现在“Date”列中的每个 Date 上的次数。 So far this is what I have written:到目前为止,这是我写的:

path = r'C:\Users\rock\Desktop\workspace\MTS_subs'          
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

counter = frame['Ran'].value_counts()
date_counter = frame['Date'].value_counts()


print(counter, date_counter)

this prints out the following:这打印出以下内容:

Active    1739
Done       840
Name: Ran, dtype: int64 18/06/2020    402
19/06/2020    300
17/06/2020    266
25/06/2020    264
22/06/2020    224
16/06/2020    214
23/06/2020    208
24/06/2020    208
26/06/2020    184
15/06/2020    180
21/06/2020     76
14/06/2020     46
20/06/2020      4
13/06/2020      3
Name: Date, dtype: int64

So in all 7 CSVs, the word "Done" appears 840 times but I would like to find out how many times "Done" appears on each of those dates.因此,在所有 7 个 CSV 中,“完成”一词出现了 840 次,但我想知道每个日期出现了多少次“完成”。

I've been scratching my head at this one for sometime.一段时间以来,我一直在挠头。 Any help or input is very much appreciated.非常感谢任何帮助或输入。

CSV 图像

(frame['Ran'] == 'Done').groupby(frame['Date']).sum() should do the trick. (frame['Ran'] == 'Done').groupby(frame['Date']).sum()应该可以解决问题。 Below is an example that simulates the screenshot that was posted.下面是一个模拟发布的屏幕截图的示例。

>>> frame = pd.DataFrame({
...     'Date': ['13/06/2020']*3 + ['15/06/2020']*2 + ['14/06/2020']*12,
...     'Ran': ['Done']*17
... })
>>> frame
          Date   Ran
0   13/06/2020  Done
1   13/06/2020  Done
2   13/06/2020  Done
3   15/06/2020  Done
4   15/06/2020  Done
5   14/06/2020  Done
6   14/06/2020  Done
7   14/06/2020  Done
8   14/06/2020  Done
9   14/06/2020  Done
10  14/06/2020  Done
11  14/06/2020  Done
12  14/06/2020  Done
13  14/06/2020  Done
14  14/06/2020  Done
15  14/06/2020  Done
16  14/06/2020  Done
>>> (frame['Ran'] == 'Done').groupby(frame['Date']).sum()
Date
13/06/2020     3.0
14/06/2020    12.0
15/06/2020     2.0
Name: Ran, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM