简体   繁体   中英

How can I extract data from a nested list into a CSV or table?

I'm currently making a Pokémon database application and, in order to prevent manually entering about 50,000 Pokémon <> Move links, I am looking to automate this process. I've found a freely available dataset online in which the Pokémon <> Move links exist, but in a nested list format.

I've copied and pasted part of the dataset here: http://pastebin.com/ADeRaBiu

In the end, I would like to have a table (ideally stored in CSV/Excel-readable format) that looks like this:

| pokemonname | move    | movelearnmethod |
|-------------|---------|-----------------|
| bulbasaur   | amnesia | 6E              |
| bulbasaur   | attract | 6M              |
| bulbasaur   | bind    | 6T              |
| bulbasaur   | endure  | 6E              |
| bulbasaur   | endure  | 6T              |

I have tried using the split() command in Python to begin splitting by delimiter, but there are multiple different delimiters and I'm unaware how to work around this. Any help would be greatly appreciated! Thanks!

Update:

Just to clarify, I want to make sure that if Pokémon has multiple movelearnmethods for one move, such as bulbasaur's endure - which has movelearnmethods of both "6E" and "6T" - that it creates a separate row for the second movelearnmethod, as in the table above.

The example data closely resembles a Python dictionary, but the keys are not quoted. You can fix that with some regex and then reference it as a Python dictionary, where parsing is pretty simple.

import re
import ast
data = """{bulbasaur:{learnset:{amnesia:["6E"],attract:["6M"],bind:["6T"],block:[],bodyslam:[],bulletseed:[],captivate:[],charm:["6E"],confide:["6M"],curse:["6E"],cut:["6M"],defensecurl:[],doubleedge:["6L027"],doubleteam:["6M"],echoedvoice:["6M"],endure:["6E","6T"],energyball:["6M"],facade:["6M"],falseswipe:[],flash:["6M"],frenzyplant:[],frustration:["6M"],furycutter:[],gigadrain:["6E","6T"],grassknot:["6M"],grasspledge:["6T"],grasswhistle:["6E"],grassyterrain:["6E"],growl:["6L003"],growth:["6L025"],headbutt:[],hiddenpower:["6M"],ingrain:["6E"],knockoff:["6T"],leafstorm:["6E"],leechseed:["6L007"],lightscreen:["6M"],magicalleaf:["6E"],mimic:[],mudslap:[],naturalgift:[],naturepower:["6E","6M"],petaldance:["6E"],poisonpowder:["6L013"],powerwhip:["6E"],protect:["6M"],razorleaf:["6L019"],rest:["6M"],"return":["6M"],rocksmash:["6M"],round:["6M"],safeguard:["6M"],secretpower:["6M"],seedbomb:["6L037","6T"],skullbash:["6E"],sleeppowder:["6L013"],sleeptalk:["6M"],sludge:["6E"],sludgebomb:["6M"],snore:["6T"],solarbeam:["6M"],strength:["6M"],stringshot:[],substitute:["6M"],sunnyday:["6M"],swagger:["6M"],sweetscent:["6L021"],swordsdance:["6M"],synthesis:["6L033","6T"],tackle:["6L001a"],takedown:["6L015"],toxic:["6M"],venoshock:["6M"],vinewhip:["6L009"],weatherball:[],worryseed:["6L031","6T"]}}}"""
dict_data = re.sub('(\w+):', '"\\1":', data)
move_data = ast.literal_eval(dict_data)
for pokemonname in move_data.keys():
    learn_set = move_data[pokemonname]['learnset']
    for move in learn_set.keys():
        for method in learn_set[move]:
            print 'pokemonname: {0}, move: {1}, movelearnmethod: {2}'.format(pokemonname, move, method)


pokemonname: bulbasaur, move: sludgebomb, movelearnmethod: 6M
pokemonname: bulbasaur, move: venoshock, movelearnmethod: 6M
pokemonname: bulbasaur, move: doubleteam, movelearnmethod: 6M
pokemonname: bulbasaur, move: confide, movelearnmethod: 6M
pokemonname: bulbasaur, move: rest, movelearnmethod: 6M
pokemonname: bulbasaur, move: sludge, movelearnmethod: 6E
pokemonname: bulbasaur, move: growth, movelearnmethod: 6L025
pokemonname: bulbasaur, move: grassknot, movelearnmethod: 6M
pokemonname: bulbasaur, move: facade, movelearnmethod: 6M
pokemonname: bulbasaur, move: return, movelearnmethod: 6M
pokemonname: bulbasaur, move: attract, movelearnmethod: 6M
pokemonname: bulbasaur, move: echoedvoice, movelearnmethod: 6M
pokemonname: bulbasaur, move: substitute, movelearnmethod: 6M
pokemonname: bulbasaur, move: growl, movelearnmethod: 6L003
pokemonname: bulbasaur, move: curse, movelearnmethod: 6E
pokemonname: bulbasaur, move: powerwhip, movelearnmethod: 6E
pokemonname: bulbasaur, move: ingrain, movelearnmethod: 6E
pokemonname: bulbasaur, move: gigadrain, movelearnmethod: 6E
pokemonname: bulbasaur, move: gigadrain, movelearnmethod: 6T
pokemonname: bulbasaur, move: worryseed, movelearnmethod: 6L031
pokemonname: bulbasaur, move: worryseed, movelearnmethod: 6T
pokemonname: bulbasaur, move: flash, movelearnmethod: 6M
pokemonname: bulbasaur, move: takedown, movelearnmethod: 6L015
...

Once you have this data, I'd suggest taking a look at Python's CSV writer: https://docs.python.org/2/library/csv.html#writer-objects . After you've created the writer object, you can replace the print above with a call to writerow.

I don't see what you mean by 'multiple delimiters'. Well, the comma is used in many places but the colon or closing bracket could be good separators.

An other way to do this is using regular expressions, and so, use perl instead of python.

Friendly, Alexis.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM