[英]Comparing Data from two columns in CSV files
Appreciate any help on this one.感谢对此的任何帮助。 I have 7 CSV files (all the same format) that that I have concatenated into one frame.
我有 7 个 CSV 文件(格式相同),我已将它们连接到一帧中。 My goal here is to compare two columns from the CSV's and find out how many times the word "Done" from the "Ran" column show up on each Date from the "Date" column.
我的目标是比较 CSV 中的两列,并找出“Ran”列中的“Done”一词出现在“Date”列中的每个 Date 上的次数。 So far this is what I have written:
到目前为止,这是我写的:
path = r'C:\Users\rock\Desktop\workspace\MTS_subs'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
counter = frame['Ran'].value_counts()
date_counter = frame['Date'].value_counts()
print(counter, date_counter)
this prints out the following:这打印出以下内容:
Active 1739
Done 840
Name: Ran, dtype: int64 18/06/2020 402
19/06/2020 300
17/06/2020 266
25/06/2020 264
22/06/2020 224
16/06/2020 214
23/06/2020 208
24/06/2020 208
26/06/2020 184
15/06/2020 180
21/06/2020 76
14/06/2020 46
20/06/2020 4
13/06/2020 3
Name: Date, dtype: int64
So in all 7 CSVs, the word "Done" appears 840 times but I would like to find out how many times "Done" appears on each of those dates.因此,在所有 7 个 CSV 中,“完成”一词出现了 840 次,但我想知道每个日期出现了多少次“完成”。
I've been scratching my head at this one for sometime.一段时间以来,我一直在挠头。 Any help or input is very much appreciated.
非常感谢任何帮助或输入。
(frame['Ran'] == 'Done').groupby(frame['Date']).sum()
should do the trick. (frame['Ran'] == 'Done').groupby(frame['Date']).sum()
应该可以解决问题。 Below is an example that simulates the screenshot that was posted.下面是一个模拟发布的屏幕截图的示例。
>>> frame = pd.DataFrame({
... 'Date': ['13/06/2020']*3 + ['15/06/2020']*2 + ['14/06/2020']*12,
... 'Ran': ['Done']*17
... })
>>> frame
Date Ran
0 13/06/2020 Done
1 13/06/2020 Done
2 13/06/2020 Done
3 15/06/2020 Done
4 15/06/2020 Done
5 14/06/2020 Done
6 14/06/2020 Done
7 14/06/2020 Done
8 14/06/2020 Done
9 14/06/2020 Done
10 14/06/2020 Done
11 14/06/2020 Done
12 14/06/2020 Done
13 14/06/2020 Done
14 14/06/2020 Done
15 14/06/2020 Done
16 14/06/2020 Done
>>> (frame['Ran'] == 'Done').groupby(frame['Date']).sum()
Date
13/06/2020 3.0
14/06/2020 12.0
15/06/2020 2.0
Name: Ran, dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.