繁体   English   中英

从 csv 文件中提取用户输入的特定列的数据(无熊猫)

[英]Pulling data from csv file for specific columns that user inputs (no pandas)

我需要一个代码帮助,该代码从我拥有的大型 csv 文件中获取用户想要的特定列的输入。 在他们自己输入他们想要的列之后,他们还必须输入 integer 输入。 integer 输入将为他们提供该列的最低出现次数的结果。 例如,如果他们键入:hospital_name, "5",它将向他们显示 5 个不同的医院(该列下至少有 50 个不同的医院名称),这些医院的数量最少。 我将编写一个示例输入和 output:

输入您想要的列:hospital_name 输入您想要的最低结果数:3

output 可能如下所示:

                      400 births are tied to Gains Hospital                                                                            
                      347 births are tied to Petri Hospital 
                      200 births are tied to Brit Hospital 

整个 csv 是关于出生的报告,因此您必须计算每个项目在每列中出现的次数并报告(最低计数)

我已经使用“with”阅读了我的 csv 文件

我无法使循环连接所有这些。 我知道用户输入本身将是 input() 和 int(input()),但这并没有将我连接回 csv 文件。

代码

import csv

column_name = input('Which column: ').upper()
number_lowest = int(input('How many lowest: '))

# Calculate births by specified column name
with open("data.csv", "r") as f:
  reader = csv.DictReader(f, skipinitialspace=True, delimiter=",")
  births_count = {}
  for d in reader:
    # Use column_name as key
    # accumulate births for this key
    if not d[column_name] in births_count:
      births_count[d[column_name]] = 0
    births_count[d[column_name]] += 1 # since each row is a different birth

# Find number_lowest lowest births
lowest_births = {}
for i in range(number_lowest):
  # By looping number_lowest times, 
  # we find this many lowest values
  if len(births_count) > 0:
    # find lowest births
    lowest_val = 1e37 # just use a large number
                      # that we know actual
                      # count will be less than

    lowest_name = ""
    for name, value in births_count.items():
      if value < lowest_val:
        lowest_val = value
        lowest_name = name

    # Add to lowest births
    lowest_births[lowest_name] = lowest_val

    # remove from births_count
    # this reduces count of items in dictionary
    del births_count[lowest_name]
  else:
    break  # births_count is empty

# Output results
for name, births in lowest_births.items():
  print(f"{births} births are tied to {name} {column_name.title()}")

测试

用逗号分隔的 CSV 数据组成三列:出生、医院、位置

File: data.csv

HOSPITAL_NAME,BIRTH_DAY, BIRTH_YEAR, BIRTH_WEIGHT
Gains,1/14,2015,8.5 lbs
Mayo Clinic,2/11,2018,6.5 lbs
Gains,1/15,2016,8.9 lbs
Stanford Health Care,2/15,2016,7.4 lbs
Mayo Clinic,11/10,2018,7.3 lbs
Gains,1/09,2011,7.5 lbs
John Hopkins,12/23,2012,6.9 lbs
Massachusetts General,9/14,2001,8.3 lbs
Stanford Health Care,8/17,2005,7.6 lbs
Massachusetts General,7/18,2016,8.7 lbs
John Hopkins,3/11,2017,7.2 lbs
Massachusetts General,4/16,2014,7.4 lbs
Northwestern Memorial,10/12,2012,8.3 lbs
UCLA Medical Center,9/19,2011,8.1 lbs
Petri,11/21,2003,7.5 lbs
UCSF Medical Center,2/15,2004,7.9 lbs

示例运行:

Which column: hospital_name
How many lowest: 5
HOSPITAL_NAME
1 births are tied to Northwestern Memorial Hospital_Name
1 births are tied to UCLA Medical Center Hospital_Name
1 births are tied to Petri Hospital_Name
1 births are tied to UCSF Medical Center Hospital_Name
2 births are tied to Mayo Clinic Hospital_Name

使用插入排序更新 Find Max

import csv

# Source: https://www.geeksforgeeks.org/python-program-for-insertion-sort/
def insertionSort(arr): 
  " Inplace location sort "
  # Traverse through 1 to len(arr) 
  for i in range(1, len(arr)): 
    key = arr[i] 
    # Move elements of arr[0..i-1], that are 
    # greater than key, to one position ahead 
    # of their current position 
    j = i-1
    while j >=0 and key < arr[j] : 
            arr[j+1] = arr[j] 
            j -= 1
    arr[j+1] = key

def find_maxs_by_sort(data, number):
  """ Finds extreems of mins or max's 
      depending upn bLowest flag
  """

  # Get list of key, value pairs as tuples of (value, key)
  tuple_list = []
  for k, v in data.items():
    tuple_list.append((v, k))

  # Sort will be in ascending order
  # Does an inplace sort
  # insertSort also works on array of tuples
  # Will sort by v since it's first in the each tuple
  insertionSort(tuple_list)

  # Place sorted tuples back as a dictionary
  # tuples are sorted by [(v1, k1), (v2, k2), ...]
  # We start at the end and work backwards since sort is
  # in ascending order
  n = len(tuple_list)
  results = {}
  for i in range(n-1, n - number - 1, -1):
    v, k = tuple_list[i]
    results[k] = v

  return results

for i in range(3):
  # To do this 3 times
  column_name = input('Which column: ').upper()
  number = int(input('How many maxs: '))

  with open("data.csv", "r") as f:
    reader = csv.DictReader(f, skipinitialspace=True, delimiter=",")
    births_count = {}
    for d in reader:
      # Use column_name as key
      # accumulate births for this key
      if not d[column_name] in births_count:
        births_count[d[column_name]] = 0
      births_count[d[column_name]] += 1 # since each row is a different birth

  # find max
  max_births = find_maxs_by_sort(births_count, number)

  # Output results
  for name, births in max_births.items():
    print(f"\t{births} births are tied to {name} {column_name.title()}")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM