简体   繁体   中英

Create a new dataset after selection in Python

Totally newebie with Python, and I'm trying to learn "on the field". So basically I managed to open a csv file, pick only the rows that have certain values in specific columns, and then print the rows.

What I'd love to do after this is basically get a random selection of one of the found rows. I thought to do that by creating a new csv file first, which at this point will only contains the filtered rows, and then randomly select from it.

Any ideas on the simplest way to do that?

Here's the portion of the code so far:

import csv
    with open("top2018.csv") as f:
        reader = csv.reader(f)
        for row in reader:
            if (row[4] >= "0.8") and (row[6] <= "-4") and (row[12] >= "0.8"):
                print(row[2] + " -", row[1])

It will find 2 rows (I checked).

And then, for creating a new csv file:

import pandas as pd
            artist = [row[2]]
            name = [row[1]]
            dict = {'artist': artist, 'name': name}
            df = pd.DataFrame(dict)
            df.to_csv('test.csv')

But I don't know why with this method, the new csv file has only 1 entry, while I'd want to have all of the found rows in it.

Hope something I wrote make sense! Thanks guys!

You are mixing columns and rows, maybe you should rename the variable row to record so you see better what is happening. Unfortunately, I have to guess as to how the data file could look like...

The dict variable (try not to use this name, this is actually a built-in function and you don't want to overwrite it) is creating two columns, "artist", and "name", which seem to have values like [1.2] . So, dict (try to print it) could look like {"artist":[2.0], "name":[3.1]} , which is a single row, two column entity.

artist     name
2.0        3.1

Try to get into pandas, use the df = pd.read_csv() and df[df.something > 0.3] style notation to filter tables, using the csv package is better suited for truly tricky data wrangling.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM